Artificial Hippocampus Networks for Efficient Long-Context Modeling
Long-sequence modeling faces a fundamental trade-off between the efficiency of compressive fixed-size memory in RNN-like models and the fidelity of lossless growing memory in attention-based Transformers. Inspired by the Multi-Store Model in cognitive science, we introduce a memory framework of artificial neural networks. Our method maintains a sliding window of the Transformer's KV cache as lossless short-term memory, while a learnable module termed Artificial Hippocampus Network (AHN) recurrently compresses out-of-window information into a fixed-size compact long-term memory. To validate this framework, we instantiate AHNs using modern RNN-like architectures, including Mamba2, DeltaNet, and Gated DeltaNet. Extensive experiments on long-context benchmarks LV-Eval and InfiniteBench demonstrate that AHN-augmented models consistently outperform sliding window baselines and achieve performance comparable or even superior to full-attention models, while substantially reducing computational and memory requirements. For instance, augmenting the Qwen2.5-3B-Instruct with AHNs reduces inference FLOPs by 40.5% and memory cache by 74.0%, while improving its average score on LV-Eval (128k sequence length) from 4.41 to 5.88. Code is available at: https://github.com/ByteDance-Seed/AHN.
Updated: 2025-10-08 17:59:55
标题: 人工海马网络用于高效的长上下文建模
摘要: 长序列建模面临着一个基本的折衷,即在类似RNN的模型中压缩固定大小内存的效率与基于注意力的Transformer中无损增长内存的保真度之间的折衷。受认知科学中的多存储模型的启发,我们引入了一种人工神经网络的记忆框架。我们的方法将Transformer的KV缓存维持为无损短期内存的滑动窗口,同时一个可学习的模块,称为人工海马网络(AHN),循环地将窗口外的信息压缩成固定大小的紧凑长期记忆。为了验证这一框架,我们使用现代RNN-like架构实例化AHNs,包括Mamba2、DeltaNet和Gated DeltaNet。对长上下文基准LV-Eval和InfiniteBench的大量实验表明,AHN增强模型始终优于滑动窗口基线,并且实现了与全注意力模型相媲美甚至更优的性能,同时大幅减少了计算和内存需求。例如,用AHNs增强Qwen2.5-3B-Instruct将推理FLOPs减少了40.5%,内存缓存减少了74.0%,同时将其在LV-Eval(128k序列长度)上的平均分数从4.41提高到5.88。代码可在以下链接找到:https://github.com/ByteDance-Seed/AHN。
更新时间: 2025-10-08 17:59:55
领域: cs.CL,cs.AI,cs.LG
NdLinear: Preserving Multi-Dimensional Structure for Parameter-Efficient Neural Networks
In deep learning, processing multidimensional inputs (e.g., images, medical scans, and time series) is an important task that often requires flattening the inputs. We introduce $\mathit{NdLinear}$, a drop-in replacement for linear layers that operates directly on tensors, requiring no flattening. By applying transformations separately along each dimension, NdLinear preserves native data structure while achieving dramatic parameter reductions, often by orders of magnitude, with minimal memory overhead. We prove NdLinear maintains expressivity through structured Tucker decomposition while preserving VC-dimension scaling. Extensive experiments demonstrate NdLinear's capacity to achieve significant parameter reductions with substantial wall-clock efficiency gains and minimal memory overhead. For instance, our $\mathit{NdLinear-LoRA}$ matches or exceeds standard LoRA on language reasoning tasks using up to $9\times$ fewer parameters. Experiments across CNNs, RNNs, Transformers, and MLPs on vision, language, time-series, and tabular tasks consistently demonstrate NdLinear's efficiency gains. While excelling at axis-separable tasks, NdLinear has limitations with entangled spatial interactions. By processing data in its original N-dimensional form, NdLinear provides a theoretically grounded, practical component for building more efficient neural architectures.
Updated: 2025-10-08 17:59:37
标题: NdLinear:保留多维结构以实现参数高效的神经网络
摘要: 在深度学习中,处理多维输入(如图像、医学扫描和时间序列)是一个重要任务,通常需要将输入展平。我们引入了$\mathit{NdLinear}$,这是一个直接在张量上操作的线性层的替代品,不需要展平。通过沿着每个维度分别应用变换,$\mathit{NdLinear}$保留了原始数据结构,同时实现了显著的参数减少,往往是数量级的降低,且内存开销最小。我们证明了$\mathit{NdLinear}$通过结构化的Tucker分解来维持表达能力,同时保持了VC-维度的扩展。广泛的实验表明,$\mathit{NdLinear}$能够实现显著的参数减少,同时获得实质性的墙钟效率提升和最小的内存开销。例如,我们的$\mathit{NdLinear-LoRA}$在语言推理任务中使用的参数数量少至多达$9\times$,与标准LoRA相匹配或超过。在视觉、语言、时间序列和表格任务上,跨CNNs、RNNs、Transformers和MLPs的实验一致表明$\mathit{NdLinear}$的效率提升。虽然在轴分离任务上表现出色,但$\mathit{NdLinear}$在空间相互作用纠缠方面有限。通过以其原始的N维形式处理数据,$\mathit{NdLinear}$为构建更有效的神经结构提供了一个理论上扎实的、实用的组件。
更新时间: 2025-10-08 17:59:37
领域: cs.LG,cs.AI
Vibe Checker: Aligning Code Evaluation with Human Preference
Large Language Models (LLMs) have catalyzed vibe coding, where users leverage LLMs to generate and iteratively refine code through natural language interactions until it passes their vibe check. Vibe check is tied to real-world human preference and goes beyond functionality: the solution should feel right, read cleanly, preserve intent, and remain correct. However, current code evaluation remains anchored to pass@k and captures only functional correctness, overlooking the non-functional instructions that users routinely apply. In this paper, we hypothesize that instruction following is the missing piece underlying vibe check that represents human preference in coding besides functional correctness. To quantify models' code instruction following capabilities with measurable signals, we present VeriCode, a taxonomy of 30 verifiable code instructions together with corresponding deterministic verifiers. We use the taxonomy to augment established evaluation suites, resulting in Vibe Checker, a testbed to assess both code instruction following and functional correctness. Upon evaluating 31 leading LLMs, we show that even the strongest models struggle to comply with multiple instructions and exhibit clear functional regression. Most importantly, a composite score of functional correctness and instruction following correlates the best with human preference, with the latter emerging as the primary differentiator on real-world programming tasks. Our work identifies core factors of the vibe check, providing a concrete path for benchmarking and developing models that better align with user preferences in coding.
Updated: 2025-10-08 17:59:19
标题: Vibe Checker: 将代码评估与人类偏好对齐
摘要: 大型语言模型(LLMs)已经催生了“vibe coding”,用户利用LLMs生成并通过自然语言交互迭代地完善代码,直到通过他们的“vibe check”。 “vibe check”与现实世界的人类偏好相关,超越功能性:解决方案应该感觉正确,可读性强,保留意图,并保持正确性。然而,当前的代码评估仍然囿于pass@k,并且仅捕捉功能正确性,忽视用户常规应用的非功能性指令。在本文中,我们假设指令遵循是构成vibe check的缺失部分,代表了除功能正确性之外的编码中人类偏好。为了通过可测信号量化模型的代码指令遵循能力,我们提出了VeriCode,一个包含30个可验证代码指令以及相应确定性验证器的分类法。我们使用这个分类法来增强已建立的评估套件,形成了Vibe Checker,一个用于评估代码指令遵循和功能正确性的实验平台。在评估了31个领先的LLMs之后,我们发现即使最强大的模型也难以遵循多个指令,并呈现明显的功能退化。最重要的是,功能正确性和指令遵循的复合评分与人类偏好最相关,后者在现实世界编程任务中成为主要区分因素。我们的工作确定了vibe check的核心因素,为基准测试和开发更符合用户编码偏好的模型提供了具体路径。
更新时间: 2025-10-08 17:59:19
领域: cs.CL,cs.AI,cs.LG,cs.SE
h1: Bootstrapping LLMs to Reason over Longer Horizons via Reinforcement Learning
Large language models excel at short-horizon reasoning tasks, but performance drops as reasoning horizon lengths increase. Existing approaches to combat this rely on inference-time scaffolding or costly step-level supervision, neither of which scales easily. In this work, we introduce a scalable method to bootstrap long-horizon reasoning capabilities using only existing, abundant short-horizon data. Our approach synthetically composes simple problems into complex, multi-step dependency chains of arbitrary length. We train models on this data using outcome-only rewards under a curriculum that automatically increases in complexity, allowing RL training to be scaled much further without saturating. Empirically, our method generalizes remarkably well: curriculum training on composed 6th-grade level math problems (GSM8K) boosts accuracy on longer, competition-level benchmarks (GSM-Symbolic, MATH-500, AIME) by up to 2.06x. Importantly, our long-horizon improvements are significantly higher than baselines even at high pass@k, showing that models can learn new reasoning paths under RL. Theoretically, we show that curriculum RL with outcome rewards achieves an exponential improvement in sample complexity over full-horizon training, providing training signal comparable to dense supervision. h1 therefore introduces an efficient path towards scaling RL for long-horizon problems using only existing data.
Updated: 2025-10-08 17:58:41
标题: 使用强化学习通过引导LLMs对更长期未来进行推理
摘要: 大型语言模型在短期推理任务上表现出色,但随着推理视野长度的增加,性能下降。现有的解决方法依赖于推理时间支撑或昂贵的步骤级监督,这两种方法都不容易扩展。在这项工作中,我们介绍了一种可扩展的方法,只使用现有丰富的短期数据来启动长期推理能力。我们的方法将简单问题合成为复杂的、任意长度的多步依赖链。我们使用只有结果奖励的课程对模型在这些数据上进行训练,该课程会自动增加复杂性,使RL训练能够进一步扩展而不会饱和。在实证方面,我们的方法具有非常好的泛化能力:对组成的六年级水平数学问题(GSM8K)进行课程培训,将长期、竞赛级别基准(GSM-Symbolic、MATH-500、AIME)的准确性提高了最多2.06倍。重要的是,我们的长期改进甚至在高pass@k时也比基线显著更高,表明模型可以在RL下学习新的推理路径。从理论上讲,我们展示了具有结果奖励的课程RL在样本复杂性上实现了指数级的改进,提供了与密集监督相媲美的训练信号。因此,h1引入了一种仅使用现有数据来扩展RL解决长期问题的高效路径。
更新时间: 2025-10-08 17:58:41
领域: cs.LG,cs.AI
MLE-Smith: Scaling MLE Tasks with Automated Multi-Agent Pipeline
While Language Models (LMs) have made significant progress in automating machine learning engineering (MLE), the acquisition of high-quality MLE training data is significantly constrained. Current MLE benchmarks suffer from low scalability and limited applicability because they rely on static, manually curated tasks, demanding extensive time and manual effort to produce. We introduce MLE-Smith, a fully automated multi-agent pipeline, to transform raw datasets into competition-style MLE challenges through an efficient generate-verify-execute paradigm for scaling MLE tasks with verifiable quality, real-world usability, and rich diversity. The proposed multi-agent pipeline in MLE-Smith drives structured task design and standardized refactoring, coupled with a hybrid verification mechanism that enforces strict structural rules and high-level semantic soundness. It further validates empirical solvability and real-world fidelity through interactive execution. We apply MLE-Smith to 224 of real-world datasets and generate 606 tasks spanning multiple categories, objectives, and modalities, demonstrating that MLE-Smith can work effectively across a wide range of real-world datasets. Evaluation on the generated tasks shows that the performance of eight mainstream and cutting-edge LLMs on MLE-Smith tasks is strongly correlated with their performance on carefully human-designed tasks, highlighting the effectiveness of the MLE-Smith to scaling up MLE tasks, while maintaining task quality.
Updated: 2025-10-08 17:57:19
标题: MLE-Smith:使用自动化多智能体管道扩展MLE任务
摘要: 语言模型(LMs)在自动化机器学习工程(MLE)方面取得了显著进展,但高质量的MLE训练数据的获取受到了显著限制。当前的MLE基准受到低可扩展性和有限适用性的影响,因为它们依赖于静态、手工策划的任务,需要大量时间和人工努力才能产生。我们引入了MLE-Smith,这是一个完全自动化的多代理管道,通过高效的生成-验证-执行范式,将原始数据集转化为竞赛风格的MLE挑战,以实现可验证的质量、实用性和丰富的多样性。MLE-Smith中提出的多代理管道推动了结构化任务设计和标准重构,结合了强制执行严格结构规则和高级语义合理性的混合验证机制。它通过交互式执行进一步验证了经验可解性和真实世界的忠实度。我们将MLE-Smith应用于224个真实世界数据集,并生成了606个跨多个类别、目标和模态的任务,展示了MLE-Smith可以在多种真实世界数据集上有效工作。对生成的任务进行评估表明,八个主流和尖端的LLMs在MLE-Smith任务上的表现与它们在精心设计的任务上的表现强相关,突出了MLE-Smith在扩大MLE任务规模的同时保持任务质量的有效性。
更新时间: 2025-10-08 17:57:19
领域: cs.LG,cs.AI
Streamlining Plug-and-Charge Authorization for Electric Vehicles with OAuth2 and OIDC
The Plug-and-Charge (PnC) process defined by ISO 15118 standardizes automated Electric Vehicle (EV) charging by enabling automatic installation of credentials and use for authentication between EV and Charge Point (CP). However, the current credential installation process is non-uniform, relies on a complex Public Key Infrastructure (PKI), lacks support for fine-grained authorization parameters, and is not very user-friendly. In this paper, we propose a streamlined approach to the initial charging authorization process by leveraging the OAuth Device Authorization Grant and Rich Authorization Requests. The proposed solution reduces technical complexity, simplifies credential installation, introduces flexible authorization constraints (e.g., time- and cost-based), and facilitates payment through OpenID Connect (OIDC). We present a proof-of-concept implementation along with performance evaluations and conduct a symbolic protocol verification using the Tamarin prover. Furthermore, our approach solves the issue of OAuth's cross-device authorization, making it suitable as a formally proven blueprint in contexts beyond EV charging.
Updated: 2025-10-08 17:57:02
标题: 使用OAuth2和OIDC简化电动汽车的插入式充电授权
摘要: ISO 15118标准定义的插拔式充电(PnC)过程通过实现电动汽车(EV)和充电桩(CP)之间的自动安装凭证和用于认证的自动化,标准化了电动汽车的充电。然而,当前的凭证安装过程不统一,依赖复杂的公钥基础设施(PKI),缺乏对细粒度授权参数的支持,而且用户体验不友好。在本文中,我们提出了一种简化初始充电授权流程的方法,利用OAuth设备授权授予和Rich授权请求。所提出的解决方案降低了技术复杂性,简化了凭证安装,引入了灵活的授权约束条件(例如,基于时间和成本),并通过OpenID Connect(OIDC)促进了支付。我们提出了一个概念验证实现,并进行了性能评估,使用Tamarin证明器进行了符号协议验证。此外,我们的方法解决了OAuth跨设备授权的问题,使其适用于EV充电之外的环境中作为形式上证明的蓝图。
更新时间: 2025-10-08 17:57:02
领域: cs.CR
Cocoon: A System Architecture for Differentially Private Training with Correlated Noises
Machine learning (ML) models memorize and leak training data, causing serious privacy issues to data owners. Training algorithms with differential privacy (DP), such as DP-SGD, have been gaining attention as a solution. However, DP-SGD adds a noise at each training iteration, which degrades the accuracy of the trained model. To improve accuracy, a new family of approaches adds carefully designed correlated noises, so that noises cancel out each other across iterations. We performed an extensive characterization study of these new mechanisms, for the first time to the best of our knowledge, and show they incur non-negligible overheads when the model is large or uses large embedding tables. Motivated by the analysis, we propose Cocoon, a hardware-software co-designed framework for efficient training with correlated noises. Cocoon accelerates models with embedding tables through pre-computing and storing correlated noises in a coalesced format (Cocoon-Emb), and supports large models through a custom near-memory processing device (Cocoon-NMP). On a real system with an FPGA-based NMP device prototype, Cocoon improves the performance by 2.33-10.82x(Cocoon-Emb) and 1.55-3.06x (Cocoon-NMP).
Updated: 2025-10-08 17:56:30
标题: 茧:一种用于具有相关噪声的差分私密训练的系统架构
摘要: 机器学习(ML)模型记忆并泄漏训练数据,给数据所有者带来严重的隐私问题。使用差分隐私(DP)的训练算法,如DP-SGD,作为解决方案已经引起了关注。然而,DP-SGD在每次训练迭代中添加噪音,会降低训练模型的准确性。为了提高准确性,一种新的方法家族添加了精心设计的相关噪音,以便在迭代中相互抵消噪音。我们进行了对这些新机制的广泛表征研究,据我们所知,这是首次,并展示了当模型较大或使用大型嵌入表时,它们会产生不可忽略的开销。在分析的基础上,我们提出了Cocoon,一个硬件软件协同设计框架,用于有效训练相关噪音。Cocoon通过预先计算和存储相关噪音的融合格式(Cocoon-Emb)来加速具有嵌入表的模型,并通过自定义近存储处理设备(Cocoon-NMP)来支持大型模型。在一个带有基于FPGA的NMP设备原型的真实系统上,Cocoon可以提高性能2.33-10.82倍(Cocoon-Emb)和1.55-3.06倍(Cocoon-NMP)。
更新时间: 2025-10-08 17:56:30
领域: cs.AR,cs.AI,cs.CR,cs.LG
Valid Inference with Imperfect Synthetic Data
Predictions and generations from large language models are increasingly being explored as an aid in limited data regimes, such as in computational social science and human subjects research. While prior technical work has mainly explored the potential to use model-predicted labels for unlabeled data in a principled manner, there is increasing interest in using large language models to generate entirely new synthetic samples (e.g., synthetic simulations), such as in responses to surveys. However, it remains unclear by what means practitioners can combine such data with real data and yet produce statistically valid conclusions upon them. In this paper, we introduce a new estimator based on generalized method of moments, providing a hyperparameter-free solution with strong theoretical guarantees to address this challenge. Intriguingly, we find that interactions between the moment residuals of synthetic data and those of real data (i.e., when they are predictive of each other) can greatly improve estimates of the target parameter. We validate the finite-sample performance of our estimator across different tasks in computational social science applications, demonstrating large empirical gains.
Updated: 2025-10-08 17:56:19
标题: 利用不完美的合成数据进行有效推理
摘要: 大型语言模型的预测和生成越来越被认为是在有限数据环境中的一种辅助工具,例如在计算社会科学和人类研究中。尽管先前的技术工作主要探讨了以合理方式使用模型预测的标签来处理未标记数据的潜力,但越来越多的人对使用大型语言模型生成全新的合成样本(例如对调查的回应等)表现出兴趣。然而,目前尚不清楚从业者如何将这些数据与真实数据结合,并对它们产生统计上有效的结论。在本文中,我们介绍了一种基于广义矩估计法的新估计器,提供了一个不需要超参数的解决方案,并具有强大的理论保证来应对这一挑战。有趣的是,我们发现合成数据的矩残差与真实数据的矩残差之间的相互作用(即它们相互预测)可以极大地改善目标参数的估计。我们验证了我们的估计器在计算社会科学应用中不同任务的有限样本性能,展示了巨大的经验收益。
更新时间: 2025-10-08 17:56:19
领域: cs.LG,cs.AI,stat.ML
Dyna-Think: Synergizing Reasoning, Acting, and World Model Simulation in AI Agents
Recent progress in reasoning with large language models (LLMs), such as DeepSeek-R1, demonstrates impressive capabilities in domains like mathematics and coding, by exhibiting complex cognitive behaviors such as verification, goal decomposition, and self-reflection. However, it is unclear what behavior is effective and what behavior is missing for long-horizon AI agents tasks. In this work, we propose Dyna-Think, a thinking framework that integrates planning with an internal world model with reasoning and acting to enhance AI agent performance. To enable Dyna-Think, we propose Dyna-Think Imitation Learning (DIT) and Dyna-Think Dyna Training (DDT). To initialize a policy with Dyna-Think, DIT reconstructs the thinking process of R1 to focus on performing world model simulation relevant to the proposed (and planned) action, and trains the policy using this reconstructed data. To enhance Dyna-Think, DDT uses a two-stage training process to first improve the agent's world modeling ability via objectives such as state prediction or critique generation, and then improve the agent's action via policy training. We evaluate our methods on OSWorld and WindowsAgentArena, and demonstrate that Dyna-Think improves the agent's in-domain and out-of-domain performance, achieving similar best-of-n performance compared to R1 while generating 2x less tokens on average. Our extensive empirical studies reveal that 1) using critique generation for world model training is effective to improve policy performance; and 2) AI agents with better performance correlate with better world modeling abilities. We believe our results suggest a promising research direction to integrate world model simulation into AI agents to enhance their reasoning, planning, and acting capabilities.
Updated: 2025-10-08 17:49:53
标题: Dyna-Think:在AI代理中协同推理、行动和世界模型模拟
摘要: 最近在利用大型语言模型(LLMs)进行推理方面取得了一些进展,例如DeepSeek-R1,在数学和编码等领域展示了令人印象深刻的能力,通过展示验证、目标分解和自我反思等复杂的认知行为。然而,目前尚不清楚对于长期AI代理任务来说哪些行为是有效的,哪些是缺失的。在这项工作中,我们提出了Dyna-Think,这是一个将规划与内部世界模型、推理和行动相结合的思考框架,以增强AI代理的性能。为了实现Dyna-Think,我们提出了Dyna-Think模仿学习(DIT)和Dyna-Think动态训练(DDT)。为了初始化一个具有Dyna-Think的策略,DIT重构了R1的思考过程,专注于执行与提出的(和计划的)行动相关的世界模型模拟,并使用这些重构的数据训练策略。为了增强Dyna-Think,DDT使用两阶段训练过程,首先通过目标(如状态预测或批评生成)改善代理的世界建模能力,然后通过策略训练改善代理的行动。我们在OSWorld和WindowsAgentArena上评估了我们的方法,并证明了Dyna-Think提高了代理在领域内和领域外的性能,实现了与R1相似的最佳性能,同时平均生成的标记量减少了两倍。我们的广泛实证研究表明:1)使用批评生成来进行世界模型训练对于改善策略性能是有效的;2)表现更好的AI代理与更好的世界建模能力相关。我们相信我们的结果表明了一个有希望的研究方向,即将世界模型模拟整合到AI代理中,以增强它们的推理、规划和行动能力。
更新时间: 2025-10-08 17:49:53
领域: cs.AI,cs.CL,cs.LG
BLISS: A Lightweight Bilevel Influence Scoring Method for Data Selection in Language Model Pretraining
Effective data selection is essential for pretraining large language models (LLMs), enhancing efficiency and improving generalization to downstream tasks. However, existing approaches often require leveraging external pretrained models, making it difficult to disentangle the effects of data selection from those of the external pretrained models. In addition, they often overlook the long-term impact of selected data if the model is trained to convergence, primarily due to the prohibitive cost of full-scale LLM pretraining. In this paper, we introduce BLISS (\textbf{B}ileve\textbf{L} \textbf{I}nfluence \textbf{S}coring method for data \textbf{S}election): a lightweight data selection method that operates entirely \emph{from scratch}, without relying on any external pretrained oracle models, while explicitly accounting for the long-term impact of selected data. BLISS leverages a small proxy model as a surrogate for the LLM and employs a score model to estimate the long-term influence of training samples if the proxy model is trained to convergence. We formulate data selection as a bilevel optimization problem, where the upper-level objective optimizes the score model to assign importance weights to training samples, ensuring that minimizing the lower-level objective (i.e., training the proxy model over the weighted training loss until convergence) leads to best validation performance. Once optimized, the trained score model predicts influence scores for the dataset, enabling efficient selection of high-quality samples for LLM pretraining. We validate BLISS by pretraining 410M/1B/2.8B Pythia and LLaMA-0.5B models on selected subsets of the C4 dataset. Notably, under the 1B model setting, BLISS achieves $1.7\times$ speedup in reaching the same performance as the state-of-the-art method, demonstrating superior performance across multiple downstream tasks.
Updated: 2025-10-08 17:49:49
标题: BLISS: 一种轻量级的双层影响评分方法,用于语言模型预训练中的数据选择
摘要: 有效的数据选择对于预训练大型语言模型(LLMs)至关重要,可以提高效率并改善对下游任务的泛化能力。然而,现有方法通常需要利用外部预训练模型,这使得很难将数据选择的影响与外部预训练模型的影响分离开来。此外,它们通常忽视了选择数据的长期影响,如果模型被训练到收敛,主要是因为全尺度LLM预训练的成本太高。在本文中,我们介绍了BLISS (BileveL Influence Scoring method for data Selection):一种轻量级的数据选择方法,完全从零开始运行,不依赖于任何外部预训练的神经网络模型,同时明确考虑所选数据的长期影响。BLISS利用一个小型代理模型作为LLM的代理,并使用一个评分模型来估计训练样本的长期影响,如果代理模型被训练到收敛。我们将数据选择建模为一个双层优化问题,其中上层目标是优化评分模型,为训练样本分配重要性权重,确保最小化下层目标(即,根据加权训练损失训练代理模型直到收敛)可以实现最佳的验证性能。一旦优化完成,训练好的评分模型可以预测数据集的影响分数,从而有效地选择高质量的样本进行LLM预训练。我们通过在C4数据集的选定子集上预训练410M/1B/2.8B Pythia和LLaMA-0.5B模型来验证BLISS。值得注意的是,在1B模型设置下,BLISS实现了1.7倍速度提升,达到了与最先进方法相同性能的速度,并在多个下游任务中展现出卓越的性能。
更新时间: 2025-10-08 17:49:49
领域: cs.LG
SafeProtein: Red-Teaming Framework and Benchmark for Protein Foundation Models
Proteins play crucial roles in almost all biological processes. The advancement of deep learning has greatly accelerated the development of protein foundation models, leading to significant successes in protein understanding and design. However, the lack of systematic red-teaming for these models has raised serious concerns about their potential misuse, such as generating proteins with biological safety risks. This paper introduces SafeProtein, the first red-teaming framework designed for protein foundation models to the best of our knowledge. SafeProtein combines multimodal prompt engineering and heuristic beam search to systematically design red-teaming methods and conduct tests on protein foundation models. We also curated SafeProtein-Bench, which includes a manually constructed red-teaming benchmark dataset and a comprehensive evaluation protocol. SafeProtein achieved continuous jailbreaks on state-of-the-art protein foundation models (up to 70% attack success rate for ESM3), revealing potential biological safety risks in current protein foundation models and providing insights for the development of robust security protection technologies for frontier models. The codes will be made publicly available at https://github.com/jigang-fan/SafeProtein.
Updated: 2025-10-08 17:47:56
标题: 安全蛋白质:蛋白质基础模型的红队框架和基准
摘要: 蛋白质在几乎所有生物过程中起着至关重要的作用。深度学习的进步极大加快了蛋白质基础模型的发展,取得了在蛋白质理解和设计方面的重大成功。然而,这些模型缺乏系统的红队测试,引发了人们对其潜在滥用的严重担忧,比如生成具有生物安全风险的蛋白质。本文介绍了SafeProtein,据我们所知,这是第一个专为蛋白质基础模型设计的红队框架。SafeProtein结合了多模态提示工程和启发式搜索,系统地设计红队方法并对蛋白质基础模型进行测试。我们还整理了SafeProtein-Bench,其中包括手工构建的红队基准数据集和全面的评估协议。SafeProtein在最先进的蛋白质基础模型上取得了持续的越狱成功(对于ESM3的攻击成功率高达70%),揭示了当前蛋白质基础模型存在潜在的生物安全风险,并为前沿模型的健壮安全保护技术的发展提供了见解。这些代码将在https://github.com/jigang-fan/SafeProtein上公开。
更新时间: 2025-10-08 17:47:56
领域: cs.LG,cs.AI,cs.CR,q-bio.BM,q-bio.QM
On the Convergence of Moral Self-Correction in Large Language Models
Large Language Models (LLMs) are able to improve their responses when instructed to do so, a capability known as self-correction. When instructions provide only a general and abstract goal without specific details about potential issues in the response, LLMs must rely on their internal knowledge to improve response quality, a process referred to as intrinsic self-correction. The empirical success of intrinsic self-correction is evident in various applications, but how and why it is effective remains unknown. Focusing on moral self-correction in LLMs, we reveal a key characteristic of intrinsic self-correction: performance convergence through multi-round interactions; and provide a mechanistic analysis of this convergence behavior. Based on our experimental results and analysis, we uncover the underlying mechanism of convergence: consistently injected self-correction instructions activate moral concepts that reduce model uncertainty, leading to converged performance as the activated moral concepts stabilize over successive rounds. This paper demonstrates the strong potential of moral self-correction by showing that it exhibits a desirable property of converged performance.
Updated: 2025-10-08 17:46:27
标题: 关于大型语言模型中道德自我修正的收敛性
摘要: 大型语言模型(LLMs)能够在被指示时改善其回应,这种能力被称为自我纠正。当指示仅提供一个一般和抽象的目标,而没有关于回应中潜在问题的具体细节时,LLMs必须依靠其内部知识来提高回应质量,这个过程称为内在自我纠正。内在自我纠正的经验成功在各种应用中是显而易见的,但它为何有效仍然未知。关注LLMs中的道德自我纠正,我们揭示了内在自我纠正的一个关键特征:通过多轮交互实现性能收敛;并对这种收敛行为进行了机械分析。基于我们的实验结果和分析,我们揭示了收敛的底层机制:持续注入的自我纠正指令激活道德概念,减少模型的不确定性,随着激活的道德概念在连续回合中稳定,导致了性能的收敛。本文通过展示道德自我纠正表现出了性能收敛的理想特性,展示了道德自我纠正的巨大潜力。
更新时间: 2025-10-08 17:46:27
领域: cs.CL,cs.LG
MolGA: Molecular Graph Adaptation with Pre-trained 2D Graph Encoder
Molecular graph representation learning is widely used in chemical and biomedical research. While pre-trained 2D graph encoders have demonstrated strong performance, they overlook the rich molecular domain knowledge associated with submolecular instances (atoms and bonds). While molecular pre-training approaches incorporate such knowledge into their pre-training objectives, they typically employ designs tailored to a specific type of knowledge, lacking the flexibility to integrate diverse knowledge present in molecules. Hence, reusing widely available and well-validated pre-trained 2D encoders, while incorporating molecular domain knowledge during downstream adaptation, offers a more practical alternative. In this work, we propose MolGA, which adapts pre-trained 2D graph encoders to downstream molecular applications by flexibly incorporating diverse molecular domain knowledge. First, we propose a molecular alignment strategy that bridge the gap between pre-trained topological representations with domain-knowledge representations. Second, we introduce a conditional adaptation mechanism that generates instance-specific tokens to enable fine-grained integration of molecular domain knowledge for downstream tasks. Finally, we conduct extensive experiments on eleven public datasets, demonstrating the effectiveness of MolGA.
Updated: 2025-10-08 17:46:22
标题: MolGA:使用预训练的2D图编码器进行分子图适应
摘要: 分子图表示学习在化学和生物医学研究中被广泛使用。尽管预训练的2D图编码器表现出色,但它们忽视了与亚分子实例(原子和键)相关的丰富分子领域知识。虽然分子预训练方法将这种知识纳入其预训练目标中,但它们通常采用专门设计的方法来针对特定类型的知识,缺乏集成分子中存在的多样知识的灵活性。因此,在在下游适应过程中,重复使用广泛可用且经过验证的预训练2D编码器,并同时纳入分子领域知识,提供了一种更实用的选择。在这项工作中,我们提出了MolGA,通过灵活地整合多样化的分子领域知识,将预训练的2D图编码器适应到下游分子应用中。首先,我们提出了一种分子对齐策略,弥合了预训练的拓扑表示与领域知识表示之间的差距。其次,我们引入了一种条件适应机制,生成实例特定的标记,以实现对下游任务的分子领域知识的精细集成。最后,我们在十一个公开数据集上进行了大量实验,展示了MolGA的有效性。
更新时间: 2025-10-08 17:46:22
领域: cs.LG
Evolutionary Profiles for Protein Fitness Prediction
Predicting the fitness impact of mutations is central to protein engineering but constrained by limited assays relative to the size of sequence space. Protein language models (pLMs) trained with masked language modeling (MLM) exhibit strong zero-shot fitness prediction; we provide a unifying view by interpreting natural evolution as implicit reward maximization and MLM as inverse reinforcement learning (IRL), in which extant sequences act as expert demonstrations and pLM log-odds serve as fitness estimates. Building on this perspective, we introduce EvoIF, a lightweight model that integrates two complementary sources of evolutionary signal: (i) within-family profiles from retrieved homologs and (ii) cross-family structural-evolutionary constraints distilled from inverse folding logits. EvoIF fuses sequence-structure representations with these profiles via a compact transition block, yielding calibrated probabilities for log-odds scoring. On ProteinGym (217 mutational assays; >2.5M mutants), EvoIF and its MSA-enabled variant achieve state-of-the-art or competitive performance while using only 0.15% of the training data and fewer parameters than recent large models. Ablations confirm that within-family and cross-family profiles are complementary, improving robustness across function types, MSA depths, taxa, and mutation depths. The codes will be made publicly available at https://github.com/aim-uofa/EvoIF.
Updated: 2025-10-08 17:46:02
标题: 蛋白质适应性预测的进化特征Profiles
摘要: 预测突变对健康的影响对蛋白质工程至关重要,但受限于有限的实验相对于序列空间的大小。使用掩码语言建模(MLM)训练的蛋白质语言模型(pLMs)表现出强大的零射击健康预测能力;我们通过将自然进化解释为隐式奖励最大化,将MLM解释为逆强化学习(IRL),其中现存序列充当专家示范,pLM对数几率作为健康估计,提供一个统一的视角。基于这一视角,我们介绍了EvoIF,这是一个集成了两种互补的进化信号源的轻量级模型:(i)从检索到的同源物中获得的家族内个人资料和(ii)从逆折叠对数中提炼出的跨家族结构进化约束。EvoIF通过一个紧凑的转换块将序列-结构表示与这些个人资料融合起来,产生了用于对数几率评分的校准概率。在ProteinGym(217个突变实验;> 250万突变体)上,EvoIF及其MSA启用的变体在仅使用0.15%的训练数据和比最近的大型模型更少的参数的情况下实现了最先进或具有竞争力的表现。消融实验证实,家族内和跨家族个人资料是互补的,提高了在功能类型、MSA深度、分类群和突变深度之间的鲁棒性。代码将在https://github.com/aim-uofa/EvoIF上公开。
更新时间: 2025-10-08 17:46:02
领域: cs.LG,cs.AI,q-bio.BM,q-bio.QM
GTCN-G: A Residual Graph-Temporal Fusion Network for Imbalanced Intrusion Detection (Preprint)
The escalating complexity of network threats and the inherent class imbalance in traffic data present formidable challenges for modern Intrusion Detection Systems (IDS). While Graph Neural Networks (GNNs) excel in modeling topological structures and Temporal Convolutional Networks (TCNs) are proficient in capturing time-series dependencies, a framework that synergistically integrates both while explicitly addressing data imbalance remains an open challenge. This paper introduces a novel deep learning framework, named Gated Temporal Convolutional Network and Graph (GTCN-G), engineered to overcome these limitations. Our model uniquely fuses a Gated TCN (G-TCN) for extracting hierarchical temporal features from network flows with a Graph Convolutional Network (GCN) designed to learn from the underlying graph structure. The core innovation lies in the integration of a residual learning mechanism, implemented via a Graph Attention Network (GAT). This mechanism preserves original feature information through residual connections, which is critical for mitigating the class imbalance problem and enhancing detection sensitivity for rare malicious activities (minority classes). We conducted extensive experiments on two public benchmark datasets, UNSW-NB15 and ToN-IoT, to validate our approach. The empirical results demonstrate that the proposed GTCN-G model achieves state-of-the-art performance, significantly outperforming existing baseline models in both binary and multi-class classification tasks.
Updated: 2025-10-08 17:45:59
标题: GTCN-G:一种用于不平衡入侵检测的剩余图时空融合网络(预印本)
摘要: 网络威胁的不断复杂化和流量数据中固有的类别不平衡给现代入侵检测系统(IDS)带来了巨大挑战。虽然图神经网络(GNNs)在建模拓扑结构方面表现出色,而时间卷积网络(TCNs)擅长捕捉时间序列依赖关系,但一个同时集成两者并明确解决数据不平衡问题的框架仍然是一个未解之谜。本文介绍了一种名为Gated Temporal Convolutional Network and Graph(GTCN-G)的新型深度学习框架,旨在克服这些局限性。我们的模型独特地融合了一个用于从网络流中提取分层时间特征的Gated TCN(G-TCN)和一个设计用于学习底层图结构的图卷积网络(GCN)。核心创新在于集成了一个通过图注意力网络(GAT)实现的残差学习机制。这种机制通过残差连接保留了原始特征信息,这对于缓解类别不平衡问题并增强对罕见恶意活动(少数类别)的检测灵敏度至关重要。我们在两个公共基准数据集UNSW-NB15和ToN-IoT上进行了大量实验以验证我们的方法。实证结果表明,所提出的GTCN-G模型在二元和多类别分类任务中均取得了最先进的性能,明显优于现有基准模型。
更新时间: 2025-10-08 17:45:59
领域: cs.LG,cs.AI
Online Rubrics Elicitation from Pairwise Comparisons
Rubrics provide a flexible way to train LLMs on open-ended long-form answers where verifiable rewards are not applicable and human preferences provide coarse signals. Prior work shows that reinforcement learning with rubric-based rewards leads to consistent gains in LLM post-training. Most existing approaches rely on rubrics that remain static over the course of training. Such static rubrics, however, are vulnerable to reward-hacking type behaviors and fail to capture emergent desiderata that arise during training. We introduce Online Rubrics Elicitation (OnlineRubrics), a method that dynamically curates evaluation criteria in an online manner through pairwise comparisons of responses from current and reference policies. This online process enables continuous identification and mitigation of errors as training proceeds. Empirically, this approach yields consistent improvements of up to 8% over training exclusively with static rubrics across AlpacaEval, GPQA, ArenaHard as well as the validation sets of expert questions and rubrics. We qualitatively analyze the elicited criteria and identify prominent themes such as transparency, practicality, organization, and reasoning.
Updated: 2025-10-08 17:44:59
标题: 通过成对比较在线获取评分标准
摘要: Rubrics提供了一种灵活的方式来训练LLMs进行开放式长篇回答,其中可验证的奖励不适用,人类偏好提供粗糙的信号。先前的研究表明,基于标尺的强化学习可以带来LLM训练后的持续增益。大多数现有方法依赖于在训练过程中保持不变的标尺。然而,这种静态标尺容易受到奖励黑客行为的影响,无法捕捉训练过程中出现的新兴期望。我们引入了在线标尺引导(OnlineRubrics)方法,通过对当前和参考策略的响应进行成对比较,在线动态地策划评估标准。这种在线过程使得在训练过程中能够持续识别和缓解错误。从经验上看,与仅使用静态标尺进行训练相比,这种方法在AlpacaEval、GPQA、ArenaHard以及专家问题和标尺的验证集上可以实现高达8%的持续改进。我们从质性上分析了引出的标准,并确定了突出的主题,如透明度、实用性、组织性和推理。
更新时间: 2025-10-08 17:44:59
领域: cs.CL,cs.AI,cs.LG
Empirical Comparison of Membership Inference Attacks in Deep Transfer Learning
With the emergence of powerful large-scale foundation models, the training paradigm is increasingly shifting from from-scratch training to transfer learning. This enables high utility training with small, domain-specific datasets typical in sensitive applications. Membership inference attacks (MIAs) provide an empirical estimate of the privacy leakage by machine learning models. Yet, prior assessments of MIAs against models fine-tuned with transfer learning rely on a small subset of possible attacks. We address this by comparing performance of diverse MIAs in transfer learning settings to help practitioners identify the most efficient attacks for privacy risk evaluation. We find that attack efficacy decreases with the increase in training data for score-based MIAs. We find that there is no one MIA which captures all privacy risks in models trained with transfer learning. While the Likelihood Ratio Attack (LiRA) demonstrates superior performance across most experimental scenarios, the Inverse Hessian Attack (IHA) proves to be more effective against models fine-tuned on PatchCamelyon dataset in high data regime.
Updated: 2025-10-08 17:41:41
标题: 深度迁移学习中成员推理攻击的经验比较
摘要: 随着强大的大规模基础模型的出现,训练范式越来越从零开始训练转向迁移学习。这使得在敏感应用中典型的小型领域特定数据集中进行高效的训练成为可能。成员推理攻击(MIAs)提供了对机器学习模型隐私泄漏的经验估计。然而,对于使用迁移学习进行微调的模型的MIAs的先前评估依赖于可能攻击的一小部分子集。我们通过比较迁移学习环境中不同MIAs的性能来帮助从业者确定用于隐私风险评估的最有效攻击。我们发现,对于基于分数的MIAs来说,随着训练数据的增加,攻击效果会下降。我们发现没有一种MIA能够捕捉到所有使用迁移学习训练的模型中的所有隐私风险。虽然Likelihood Ratio Attack(LiRA)在大多数实验场景中表现出优越性能,但在高数据量情况下,Inverse Hessian Attack(IHA)对于在PatchCamelyon数据集上进行微调的模型更为有效。
更新时间: 2025-10-08 17:41:41
领域: cs.LG,cs.CR
Dual Natural Gradient Descent for Scalable Training of Physics-Informed Neural Networks
Natural-gradient methods markedly accelerate the training of Physics-Informed Neural Networks (PINNs), yet their Gauss--Newton update must be solved in the parameter space, incurring a prohibitive $O(n^3)$ time complexity, where $n$ is the number of network trainable weights. We show that exactly the same step can instead be formulated in a generally smaller residual space of size $m = \sum_{\gamma} N_{\gamma} d_{\gamma}$, where each residual class $\gamma$ (e.g. PDE interior, boundary, initial data) contributes $N_{\gamma}$ collocation points of output dimension $d_{\gamma}$. Building on this insight, we introduce \textit{Dual Natural Gradient Descent} (D-NGD). D-NGD computes the Gauss--Newton step in residual space, augments it with a geodesic-acceleration correction at negligible extra cost, and provides both a dense direct solver for modest $m$ and a Nystrom-preconditioned conjugate-gradient solver for larger $m$. Experimentally, D-NGD scales second-order PINN optimization to networks with up to 12.8 million parameters, delivers one- to three-order-of-magnitude lower final error $L^2$ than first-order methods (Adam, SGD) and quasi-Newton methods, and -- crucially -- enables natural-gradient training of PINNs at this scale on a single GPU.
Updated: 2025-10-08 17:39:20
标题: 双自然梯度下降用于物理信息神经网络可扩展训练
摘要: 自然梯度方法显著加速了物理推断神经网络(PINNs)的训练,然而它们的高斯-牛顿更新必须在参数空间中解决,导致了$O(n^3)$的时间复杂度,其中$n$是网络可训练权重的数量。我们展示了完全相同的步骤可以在一个通常较小的大小为$m = \sum_{\gamma} N_{\gamma} d_{\gamma}$的残差空间中进行重新构建,其中每个残差类别$\gamma$(例如PDE内部、边界、初始数据)贡献了$N_{\gamma}$个输出维度为$d_{\gamma}$的插值点。 基于这一洞察力,我们引入了“双自然梯度下降”(D-NGD)。D-NGD在残差空间中计算高斯-牛顿步骤,在几乎没有额外成本的情况下增加了测地线加速校正,并为适度的$m$提供了密集直接求解器,以及为较大的$m$提供了Nystrom预条件共轭梯度求解器。 在实验中,D-NGD将二阶PINN优化扩展到具有高达1280万参数的网络,比一阶方法(Adam、SGD)和拟牛顿方法获得了一个到三个数量级较低的最终误差$L^2,并且关键地,在单个GPU上实现了这一规模的PINN的自然梯度训练。
更新时间: 2025-10-08 17:39:20
领域: cs.LG,math.OC
Dynamic Regret Bounds for Online Omniprediction with Long Term Constraints
We present an algorithm guaranteeing dynamic regret bounds for online omniprediction with long term constraints. The goal in this recently introduced problem is for a learner to generate a sequence of predictions which are broadcast to a collection of downstream decision makers. Each decision maker has their own utility function, as well as a vector of constraint functions, each mapping their actions and an adversarially selected state to reward or constraint violation terms. The downstream decision makers select actions "as if" the state predictions are correct, and the goal of the learner is to produce predictions such that all downstream decision makers choose actions that give them worst-case utility guarantees while minimizing worst-case constraint violation. Within this framework, we give the first algorithm that obtains simultaneous \emph{dynamic regret} guarantees for all of the agents -- where regret for each agent is measured against a potentially changing sequence of actions across rounds of interaction, while also ensuring vanishing constraint violation for each agent. Our results do not require the agents themselves to maintain any state -- they only solve one-round constrained optimization problems defined by the prediction made at that round.
Updated: 2025-10-08 17:28:05
标题: 在线全预测长期约束的动态后悔界限
摘要: 我们提出了一种算法,保证在线全预测过程中长期约束的动态遗憾边界。这个最近引入的问题的目标是让学习者生成一系列预测,这些预测被广播给一群下游决策者。每个决策者都有自己的效用函数,以及一组约束函数向量,每个函数将他们的行动和一个对抗选择的状态映射到奖励或约束违规条款。下游决策者选择行动“仿佛”状态预测是正确的,学习者的目标是产生预测,以便所有下游决策者选择行动,使他们获得最坏情况下的效用保证,同时最小化最坏情况下的约束违规。在这个框架内,我们提供了第一个算法,为所有代理人获得同时的动态遗憾保证 - 每个代理的遗憾是根据在互动轮回中的一系列行动可能发生变化而衡量的,同时也确保每个代理的约束违规逐渐消失。我们的结果不要求代理人自己维护任何状态 - 他们只需要解决在该轮次进行的预测所定义的一轮受限优化问题。
更新时间: 2025-10-08 17:28:05
领域: cs.LG,cs.GT
Test-Time Graph Search for Goal-Conditioned Reinforcement Learning
Offline goal-conditioned reinforcement learning (GCRL) trains policies that reach user-specified goals at test time, providing a simple, unsupervised, domain-agnostic way to extract diverse behaviors from unlabeled, reward-free datasets. Nonetheless, long-horizon decision making remains difficult for GCRL agents due to temporal credit assignment and error accumulation, and the offline setting amplifies these effects. To alleviate this issue, we introduce Test-Time Graph Search (TTGS), a lightweight planning approach to solve the GCRL task. TTGS accepts any state-space distance or cost signal, builds a weighted graph over dataset states, and performs fast search to assemble a sequence of subgoals that a frozen policy executes. When the base learner is value-based, the distance is derived directly from the learned goal-conditioned value function, so no handcrafted metric is needed. TTGS requires no changes to training, no additional supervision, no online interaction, and no privileged information, and it runs entirely at inference. On the OGBench benchmark, TTGS improves success rates of multiple base learners on challenging locomotion tasks, demonstrating the benefit of simple metric-guided test-time planning for offline GCRL.
Updated: 2025-10-08 17:20:53
标题: 测试时间图搜索用于目标条件强化学习
摘要: 离线目标条件强化学习(GCRL)训练可以在测试时达到用户指定目标的策略,为从未标记、无奖励数据集提取多样行为提供了一种简单的、无监督的、与领域无关的方法。然而,由于时间信用分配和误差累积,长期决策对于GCRL代理仍然困难,离线设置加剧了这些影响。为了缓解这个问题,我们引入了测试时间图搜索(TTGS),这是一种轻量级的规划方法,用于解决GCRL任务。TTGS接受任何状态空间距离或成本信号,构建数据集状态之间的加权图,并执行快速搜索以组装一系列子目标,冻结策略执行。当基础学习者是基于价值的时,距离直接从学习的目标条件值函数中导出,因此不需要手工制作的度量标准。TTGS不需要对训练进行任何更改,也不需要额外的监督、在线交互或特权信息,完全在推理阶段运行。在OGBench基准测试中,TTGS提高了多个基础学习者在具有挑战性的运动任务上的成功率,展示了离线GCRL的简单度量引导测试时间规划的好处。
更新时间: 2025-10-08 17:20:53
领域: cs.LG
Discriminative Feature Feedback with General Teacher Classes
We study the theoretical properties of the interactive learning protocol Discriminative Feature Feedback (DFF) (Dasgupta et al., 2018). The DFF learning protocol uses feedback in the form of discriminative feature explanations. We provide the first systematic study of DFF in a general framework that is comparable to that of classical protocols such as supervised learning and online learning. We study the optimal mistake bound of DFF in the realizable and the non-realizable settings, and obtain novel structural results, as well as insights into the differences between Online Learning and settings with richer feedback such as DFF. We characterize the mistake bound in the realizable setting using a new notion of dimension. In the non-realizable setting, we provide a mistake upper bound and show that it cannot be improved in general. Our results show that unlike Online Learning, in DFF the realizable dimension is insufficient to characterize the optimal non-realizable mistake bound or the existence of no-regret algorithms.
Updated: 2025-10-08 17:14:22
标题: 具有通用教师类别的判别性特征反馈
摘要: 我们研究了交互式学习协议Discriminative Feature Feedback(DFF)(Dasgupta等人,2018)的理论特性。DFF学习协议使用区分性特征解释形式的反馈。我们在一个类似于监督学习和在线学习等经典协议的一般框架中首次系统研究了DFF。我们研究了DFF在可实现和不可实现设置中的最优错误界,并获得了新颖的结构结果,以及对在线学习和具有更丰富反馈的设置(如DFF)之间的差异的见解。我们利用一个新的维度概念来表征可实现设置中的错误界。在不可实现设置中,我们提供了一个错误上界,并且显示通常情况下无法改进它。我们的结果表明,与在线学习不同,在DFF中,可实现维度不足以表征最优的不可实现错误界或无悔算法的存在。
更新时间: 2025-10-08 17:14:22
领域: cs.LG
Lossy Neural Compression for Geospatial Analytics: A Review
Over the past decades, there has been an explosion in the amount of available Earth Observation (EO) data. The unprecedented coverage of the Earth's surface and atmosphere by satellite imagery has resulted in large volumes of data that must be transmitted to ground stations, stored in data centers, and distributed to end users. Modern Earth System Models (ESMs) face similar challenges, operating at high spatial and temporal resolutions, producing petabytes of data per simulated day. Data compression has gained relevance over the past decade, with neural compression (NC) emerging from deep learning and information theory, making EO data and ESM outputs ideal candidates due to their abundance of unlabeled data. In this review, we outline recent developments in NC applied to geospatial data. We introduce the fundamental concepts of NC including seminal works in its traditional applications to image and video compression domains with focus on lossy compression. We discuss the unique characteristics of EO and ESM data, contrasting them with "natural images", and explain the additional challenges and opportunities they present. Moreover, we review current applications of NC across various EO modalities and explore the limited efforts in ESM compression to date. The advent of self-supervised learning (SSL) and foundation models (FM) has advanced methods to efficiently distill representations from vast unlabeled data. We connect these developments to NC for EO, highlighting the similarities between the two fields and elaborate on the potential of transferring compressed feature representations for machine--to--machine communication. Based on insights drawn from this review, we devise future directions relevant to applications in EO and ESM.
Updated: 2025-10-08 17:10:10
标题: 地理空间分析的损失神经压缩:一项评估
摘要: 在过去的几十年里,地球观测(EO)数据的数量爆炸性增长。卫星图像对地球表面和大气的前所未有覆盖导致产生了大量数据,这些数据必须传输到地面站,存储在数据中心,并分发给最终用户。现代地球系统模型(ESMs)面临着类似的挑战,以高空间和时间分辨率运行,每天模拟产生的数据量达到PB级。在过去的十年中,数据压缩变得越来越重要,神经压缩(NC)从深度学习和信息论中出现,使得EO数据和ESM输出成为理想的候选者,因为它们具有大量未标记的数据。在本综述中,我们概述了应用于地理空间数据的NC的最新发展。我们介绍了NC的基本概念,包括其在传统图像和视频压缩领域的开创性工作,重点放在有损压缩上。我们讨论了EO和ESM数据的独特特征,将它们与“自然图像”进行对比,并解释了它们提出的额外挑战和机遇。此外,我们回顾了跨各种EO模态的NC的当前应用,并探讨了迄今为止在ESM压缩方面的有限努力。自监督学习(SSL)和基础模型(FM)的出现推动了从大量未标记数据中高效提取表示的方法。我们将这些发展与EO的NC联系起来,突出了这两个领域之间的相似之处,并详细阐述了将压缩特征表示传递给机器之间通信的潜力。根据本综述得出的见解,我们制定了与EO和ESM应用相关的未来方向。
更新时间: 2025-10-08 17:10:10
领域: eess.SP,cs.AI,cs.CV,cs.LG,physics.geo-ph
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense
Post-training for reasoning of large language models (LLMs) increasingly relies on verifiable rewards: deterministic checkers that provide 0-1 correctness signals. While reliable, such binary feedback is brittle--many tasks admit partially correct or alternative answers that verifiers under-credit, and the resulting all-or-nothing supervision limits learning. Reward models offer richer, continuous feedback, which can serve as a complementary supervisory signal to verifiers. We introduce HERO (Hybrid Ensemble Reward Optimization), a reinforcement learning framework that integrates verifier signals with reward-model scores in a structured way. HERO employs stratified normalization to bound reward-model scores within verifier-defined groups, preserving correctness while refining quality distinctions, and variance-aware weighting to emphasize challenging prompts where dense signals matter most. Across diverse mathematical reasoning benchmarks, HERO consistently outperforms RM-only and verifier-only baselines, with strong gains on both verifiable and hard-to-verify tasks. Our results show that hybrid reward design retains the stability of verifiers while leveraging the nuance of reward models to advance reasoning.
Updated: 2025-10-08 17:09:41
标题: 混合强化:当奖励稀缺时,更好的选择是密集的
摘要: 针对大型语言模型(LLMs)推理的后训练越来越依赖可验证的奖励:提供0-1正确性信号的确定性检查器。尽管可靠,但这种二元反馈是脆弱的 - 许多任务允许部分正确或替代答案,验证器低估了这些答案,导致了全有或全无的监督限制了学习。奖励模型提供更丰富、连续的反馈,可以作为验证器的补充监督信号。我们介绍了HERO(Hybrid Ensemble Reward Optimization),这是一个结合了验证器信号和奖励模型分数的强化学习框架。HERO采用分层归一化将奖励模型分数限制在验证器定义的组内,保持正确性同时提炼质量区别,并且采用方差感知加权来强调在密集信号最重要的挑战性提示。在各种数学推理基准测试中,HERO始终优于仅使用RM或仅使用验证器的基线,在可验证和难以验证的任务上均取得了显著的收益。我们的结果表明混合奖励设计保留了验证器的稳定性,同时利用奖励模型的细微差别推进了推理。
更新时间: 2025-10-08 17:09:41
领域: cs.CL,cs.LG
AutoMind: Adaptive Knowledgeable Agent for Automated Data Science
Large Language Model (LLM) agents have shown great potential in addressing real-world data science problems. LLM-driven data science agents promise to automate the entire machine learning pipeline, yet their real-world effectiveness remains limited. Existing frameworks depend on rigid, pre-defined workflows and inflexible coding strategies; consequently, they excel only on relatively simple, classical problems and fail to capture the empirical expertise that human practitioners bring to complex, innovative tasks. In this work, we introduce AutoMind, an adaptive, knowledgeable LLM-agent framework that overcomes these deficiencies through three key advances: (1) a curated expert knowledge base that grounds the agent in domain expert knowledge, (2) an agentic knowledgeable tree search algorithm that strategically explores possible solutions, and (3) a self-adaptive coding strategy that dynamically tailors code generation to task complexity. Evaluations on two automated data science benchmarks demonstrate that AutoMind delivers superior performance versus state-of-the-art baselines. Additional analyses confirm favorable effectiveness, efficiency, and qualitative solution quality, highlighting AutoMind as an efficient and robust step toward fully automated data science. Code is at https://github.com/innovatingAI/AutoMind.
Updated: 2025-10-08 17:06:47
标题: AutoMind:自适应知识智能体用于自动化数据科学
摘要: 大型语言模型(LLM)代理在解决现实世界数据科学问题方面显示出巨大潜力。以LLM为驱动的数据科学代理承诺自动化整个机器学习流程,但它们在现实世界中的有效性仍然有限。现有框架依赖于刚性、预定义的工作流程和不灵活的编码策略;因此,它们只在相对简单、传统的问题上表现出色,无法捕捉人类从业者在复杂、创新任务中带来的经验知识。在这项工作中,我们介绍了AutoMind,一种自适应、知识丰富的LLM代理框架,通过三个关键进展克服了这些缺陷:(1)一个策划的专家知识库,使代理在领域专家知识中立足,(2)一种具有知识的代理树搜索算法,策略性地探索可能的解决方案,以及(3)一种自适应编码策略,动态地根据任务复杂性定制代码生成。对两个自动化数据科学基准的评估表明,AutoMind相对于最先进的基线模型提供了优越的性能。额外的分析证实了有利的有效性、效率和定性解决方案质量,突显AutoMind作为全自动数据科学的高效和稳健步骤。代码位于https://github.com/innovatingAI/AutoMind。
更新时间: 2025-10-08 17:06:47
领域: cs.CL,cs.AI,cs.HC,cs.LG,cs.MA
Prefilled responses enhance zero-shot detection of AI-generated images
As AI models generate increasingly realistic images, growing concerns over potential misuse underscore the need for reliable detection. Traditional supervised detection methods depend on large, curated datasets for training and often fail to generalize to novel, out-of-domain image generators. As an alternative, we explore pre-trained Vision-Language Models (VLMs) for zero-shot detection of AI-generated images. We evaluate VLM performance on three diverse benchmarks encompassing synthetic images of human faces, objects, and animals produced by 16 different state-of-the-art image generators. While off-the-shelf VLMs perform poorly on these datasets, we find that their reasoning can be guided effectively through simple response prefilling -- a method we call Prefill-Guided Thinking (PGT). In particular, prefilling a VLM response with the task-aligned phrase "Let's examine the style and the synthesis artifacts" improves the Macro F1 scores of three widely used open-source VLMs by up to 24%.
Updated: 2025-10-08 16:59:43
标题: 填充响应增强零-shot检测AI生成的图像
摘要: 随着人工智能模型生成越来越逼真的图像,对潜在滥用的担忧日益加剧,也凸显了可靠检测的必要性。传统的监督式检测方法依赖于大量策划的数据集用于训练,并且往往无法泛化到新颖的、超领域的图像生成器。作为替代方案,我们探索了预训练的视觉-语言模型(VLMs)用于零样本检测人工智能生成的图像。我们评估了VLM在三个不同基准测试中的表现,包括由16种不同的最先进图像生成器生成的合成人脸、物体和动物图像。尽管现成的VLM在这些数据集上表现不佳,但我们发现通过简单的响应预填充可以有效引导它们的推理,这一方法被称为预填充引导思维(PGT)。特别是,通过用任务对齐的短语“让我们检查风格和综合工艺”预填充VLM的响应,可以将三种广泛使用的开源VLM的宏F1分数提高高达24%。
更新时间: 2025-10-08 16:59:43
领域: cs.LG,cs.AI,cs.CL
KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality
Large Language Models (LLMs), particularly slow-thinking models, often exhibit severe hallucination, outputting incorrect content due to an inability to accurately recognize knowledge boundaries during reasoning. While Reinforcement Learning (RL) can enhance complex reasoning abilities, its outcome-oriented reward mechanism often lacks factual supervision over the thinking process, further exacerbating the hallucination problem. To address the high hallucination in slow-thinking models, we propose Knowledge-enhanced RL, KnowRL. KnowRL guides models to perform fact-based slow thinking by integrating a factuality reward, based on knowledge verification, into the RL training process, helping them recognize their knowledge boundaries. KnowRL guides models to perform fact-based slow thinking by integrating a factuality reward, based on knowledge verification, into the RL training process, helping them recognize their knowledge boundaries. This targeted factual input during RL training enables the model to learn and internalize fact-based reasoning strategies. By directly rewarding adherence to facts within the reasoning steps, KnowRL fosters a more reliable thinking process. Experimental results on three hallucination evaluation datasets and two reasoning evaluation datasets demonstrate that KnowRL effectively mitigates hallucinations in slow-thinking models while maintaining their original strong reasoning capabilities. Our code is available at https://github.com/zjunlp/KnowRL.
Updated: 2025-10-08 16:56:59
标题: KnowRL:探索知识强化学习以获取事实性
摘要: 大型语言模型(LLMs),特别是慢思考模型,经常表现出严重的幻觉,输出不正确的内容,因为在推理过程中无法准确识别知识边界。虽然强化学习(RL)可以增强复杂的推理能力,但其以结果为导向的奖励机制通常缺乏对思维过程的事实监督,进一步加剧了幻觉问题。为了解决慢思考模型中的高幻觉问题,我们提出了增强知识的强化学习,KnowRL。KnowRL通过将基于知识验证的事实性奖励整合到RL训练过程中,指导模型执行基于事实的慢思考,帮助它们识别自己的知识边界。这种在RL训练过程中的有针对性的事实输入使模型能够学习和内化基于事实的推理策略。通过直接奖励在推理步骤中遵循事实,KnowRL促进了更可靠的思考过程。在三个幻觉评估数据集和两个推理评估数据集上的实验结果表明,KnowRL有效地减轻了慢思考模型中的幻觉,同时保持了它们原有的强大推理能力。我们的代码可在https://github.com/zjunlp/KnowRL 上找到。
更新时间: 2025-10-08 16:56:59
领域: cs.AI,cs.CL,cs.CV,cs.LG,cs.MA
Bit-Level Discrete Diffusion with Markov Probabilistic Models: An Improved Framework with Sharp Convergence Bounds under Minimal Assumptions
This paper introduces Discrete Markov Probabilistic Models (DMPMs), a novel discrete diffusion algorithm for discrete data generation. The algorithm operates in discrete bit space, where the noising process is a continuous-time Markov chain that flips labels uniformly at random. The time-reversal process, like the forward noise process, is a jump process with its intensity governed by a discrete analogue of the classical score function. Crucially, this intensity is proven to be the conditional expectation of a function of the forward process, underlining theoretical alignment with score-based generative models. We establish convergence bounds for the algorithm under minimal assumptions, ensuring robustness and efficiency, which we demonstrate through experiments on low-dimensional Bernoulli-distributed datasets and high-dimensional binary MNIST data. The results highlight competitive performance in generating discrete structures compared to the state-of-the-art. This work bridges theoretical foundations and practical applications, advancing the development of effective and theoretically grounded discrete generative modeling.
Updated: 2025-10-08 16:55:19
标题: 比特级别的离散扩散与马尔可夫概率模型:在最小假设下具有尖锐收敛界限的改进框架
摘要: 本文介绍了离散马尔可夫概率模型(DMPMs),这是一种新颖的用于离散数据生成的离散扩散算法。该算法在离散比特空间中运行,其中噪声过程是一个连续时间马尔可夫链,以均匀随机地翻转标签。时空转换过程,就像前向噪声过程一样,是一个跳跃过程,其强度由经典得分函数的离散类比所控制。关键是,这种强度被证明是前向过程的一个函数的条件期望,强调了与基于得分的生成模型的理论对齐。我们在最小假设下建立了算法的收敛界限,确保了其稳健性和效率,我们通过对低维伯努利分布数据集和高维二元MNIST数据的实验进行了演示。结果突出了与最先进技术相比在生成离散结构方面的竞争性表现。这项工作架起了理论基础和实际应用之间的桥梁,推进了有效和理论基础的离散生成建模的发展。
更新时间: 2025-10-08 16:55:19
领域: stat.ML,cs.LG
Differential Privacy for Adaptive Weight Aggregation in Federated Tumor Segmentation
Federated Learning (FL) is a distributed machine learning approach that safeguards privacy by creating an impartial global model while respecting the privacy of individual client data. However, the conventional FL method can introduce security risks when dealing with diverse client data, potentially compromising privacy and data integrity. To address these challenges, we present a differential privacy (DP) federated deep learning framework in medical image segmentation. In this paper, we extend our similarity weight aggregation (SimAgg) method to DP-SimAgg algorithm, a differentially private similarity-weighted aggregation algorithm for brain tumor segmentation in multi-modal magnetic resonance imaging (MRI). Our DP-SimAgg method not only enhances model segmentation capabilities but also provides an additional layer of privacy preservation. Extensive benchmarking and evaluation of our framework, with computational performance as a key consideration, demonstrate that DP-SimAgg enables accurate and robust brain tumor segmentation while minimizing communication costs during model training. This advancement is crucial for preserving the privacy of medical image data and safeguarding sensitive information. In conclusion, adding a differential privacy layer in the global weight aggregation phase of the federated brain tumor segmentation provides a promising solution to privacy concerns without compromising segmentation model efficacy. By leveraging DP, we ensure the protection of client data against adversarial attacks and malicious participants.
Updated: 2025-10-08 16:53:55
标题: 《联邦肿瘤分割中自适应权重聚合的差分隐私》
摘要: 联邦学习(FL)是一种分布式机器学习方法,通过创建一个公正的全局模型来保护隐私,同时尊重个体客户数据的隐私。然而,传统的FL方法在处理多样化的客户数据时可能会引入安全风险,潜在地危及隐私和数据完整性。为了解决这些挑战,我们提出了一种差分隐私(DP)联邦深度学习框架,用于医学图像分割。在本文中,我们将我们的相似性权重聚合(SimAgg)方法扩展为DP-SimAgg算法,这是一种用于多模式磁共振成像(MRI)中脑肿瘤分割的差分隐私相似性加权聚合算法。我们的DP-SimAgg方法不仅增强了模型的分割能力,还提供了额外的隐私保护层。通过对我们的框架进行广泛的基准测试和评估,以计算性能作为关键考虑因素,我们证明了DP-SimAgg在模型训练过程中实现了准确和稳健的脑肿瘤分割,同时最大限度地减少了通信成本。这一进展对于保护医学图像数据的隐私和保护敏感信息至关重要。总之,在联邦脑肿瘤分割的全局权重聚合阶段添加差分隐私层为隐私问题提供了一个有前途的解决方案,同时不影响分割模型的有效性。通过利用差分隐私,我们确保了客户数据免受对抗性攻击和恶意参与者的保护。
更新时间: 2025-10-08 16:53:55
领域: cs.LG,cs.CR,eess.IV
Security-Robustness Trade-offs in Diffusion Steganography: A Comparative Analysis of Pixel-Space and VAE-Based Architectures
Current generative steganography research mainly pursues computationally expensive mappings to perfect Gaussian priors within single diffusion model architectures. This work introduces an efficient framework based on approximate Gaussian mapping governed by a scale factor calibrated through capacity-aware adaptive optimization. Using this framework as a unified analytical tool, systematic comparative analysis of steganography in pixel-space models versus VAE-based latent-space systems is conducted. The investigation reveals a pronounced architecture dependent security-robustness trade-off: pixel-space models achieve high security against steganalysis but exhibit fragility to channel distortions, while VAE-based systems like Stable Diffusion offer substantial robustness at the cost of security vulnerabilities. Further analysis indicates that the VAE component drives this behavior through opposing mechanisms where the encoder confers robustness via manifold regularization while the decoder introduces vulnerabilities by amplifying latent perturbations into detectable artifacts. These findings characterize the conflicting architectural roles in generative steganography and establish a foundation for future research.
Updated: 2025-10-08 16:53:52
标题: 扩散隐写术中的安全性-鲁棒性权衡:像素空间和基于VAE的架构的比较分析
摘要: 目前的生成隐写术研究主要追求通过单一扩散模型架构中完美高斯先验的计算昂贵的映射。本文介绍了一种基于近似高斯映射的高效框架,通过容量感知自适应优化校准的比例因子来控制。利用这个框架作为统一的分析工具,对像素空间模型和基于VAE的潜在空间系统中的隐写术进行了系统比较分析。调查揭示了一个显著的架构相关的安全-鲁棒性权衡:像素空间模型在抵抗隐写术分析方面具有较高的安全性,但对信道失真表现出脆弱性,而基于VAE的系统如Stable Diffusion在安全性漏洞的代价下提供了相当的鲁棒性。进一步的分析表明,VAE组件通过相反的机制推动了这种行为,其中编码器通过流形正则化赋予鲁棒性,而解码器通过将潜在扰动放大为可检测的伪装品而引入了漏洞。这些发现表征了生成隐写术中相互冲突的架构角色,并为未来研究奠定了基础。
更新时间: 2025-10-08 16:53:52
领域: cs.CR
AMBER: Adaptive Mesh Generation by Iterative Mesh Resolution Prediction
The cost and accuracy of simulating complex physical systems using the Finite Element Method (FEM) scales with the resolution of the underlying mesh. Adaptive meshes improve computational efficiency by refining resolution in critical regions, but typically require task-specific heuristics or cumbersome manual design by a human expert. We propose Adaptive Meshing By Expert Reconstruction (AMBER), a supervised learning approach to mesh adaptation. Starting from a coarse mesh, AMBER iteratively predicts the sizing field, i.e., a function mapping from the geometry to the local element size of the target mesh, and uses this prediction to produce a new intermediate mesh using an out-of-the-box mesh generator. This process is enabled through a hierarchical graph neural network, and relies on data augmentation by automatically projecting expert labels onto AMBER-generated data during training. We evaluate AMBER on 2D and 3D datasets, including classical physics problems, mechanical components, and real-world industrial designs with human expert meshes. AMBER generalizes to unseen geometries and consistently outperforms multiple recent baselines, including ones using Graph and Convolutional Neural Networks, and Reinforcement Learning-based approaches.
Updated: 2025-10-08 16:48:28
标题: 琥珀:通过迭代网格分辨率预测实现自适应网格生成
摘要: 使用有限元方法(FEM)模拟复杂物理系统的成本和准确性与底层网格的分辨率成比例。自适应网格通过在关键区域细化分辨率来提高计算效率,但通常需要特定任务的启发式方法或人工专家的繁琐手动设计。我们提出了一种名为“专家重建自适应网格”(AMBER)的监督学习方法来适应网格。从粗网格开始,AMBER迭代地预测大小场,即从几何到目标网格的局部元素大小的映射函数,并利用此预测使用开箱即用的网格生成器生成新的中间网格。此过程通过分层图神经网络实现,并依赖于数据增强,通过在训练期间将专家标签自动投影到AMBER生成的数据上。我们在2D和3D数据集上评估了AMBER,包括经典物理问题、机械元件和具有人工专家网格的真实工业设计。AMBER可推广到未见几何形状,并始终优于多个最近的基线方法,包括使用图和卷积神经网络以及基于强化学习的方法。
更新时间: 2025-10-08 16:48:28
领域: cs.LG,cs.CG
A Broader View of Thompson Sampling
Thompson Sampling is one of the most widely used and studied bandit algorithms, known for its simple structure, low regret performance, and solid theoretical guarantees. Yet, in stark contrast to most other families of bandit algorithms, the exact mechanism through which posterior sampling (as introduced by Thompson) is able to "properly" balance exploration and exploitation, remains a mystery. In this paper we show that the core insight to address this question stems from recasting Thompson Sampling as an online optimization algorithm. To distill this, a key conceptual tool is introduced, which we refer to as "faithful" stationarization of the regret formulation. Essentially, the finite horizon dynamic optimization problem is converted into a stationary counterpart which "closely resembles" the original objective (in contrast, the classical infinite horizon discounted formulation, that leads to the Gittins index, alters the problem and objective in too significant a manner). The newly crafted time invariant objective can be studied using Bellman's principle which leads to a time invariant optimal policy. When viewed through this lens, Thompson Sampling admits a simple online optimization form that mimics the structure of the Bellman-optimal policy, and where greediness is regularized by a measure of residual uncertainty based on point-biserial correlation. This answers the question of how Thompson Sampling balances exploration-exploitation, and moreover, provides a principled framework to study and further improve Thompson's original idea.
Updated: 2025-10-08 16:43:02
标题: 汤普森抽样的更广泛视角
摘要: Thompson Sampling是最广泛使用和研究的赌博算法之一,以其简单的结构、低遗憾表现和坚实的理论保证而闻名。然而,与大多数其他赌博算法族群形成鲜明对比的是,通过Thompson引入的后验抽样机制如何能够“适当”平衡探索和利用,仍然是一个谜。本文表明,解决这个问题的核心洞察力源于将Thompson Sampling重新构建为一种在线优化算法。为了提炼这一点,引入了一个关键的概念工具,我们称之为“忠实”遗憾公式的稳定性。基本上,有限时间段的动态优化问题被转化为一个“紧密类似”原始目标的静态对应物(相比之下,导致Gittins指数的经典无限时间段折扣公式会在问题和目标上产生太显著的改变)。新设计的时不变目标可以使用贝尔曼原理进行研究,从而导致一个时不变的最优策略。通过这个视角看,Thompson Sampling可以展示出一个简单的在线优化形式,模仿贝尔曼最优策略的结构,并且贪婪性会根据点双列相关的残余不确定性测度进行正规化。这回答了Thompson Sampling如何平衡探索和利用的问题,而且还提供了一个原则性框架来研究和进一步改进Thompson的原始想法。
更新时间: 2025-10-08 16:43:02
领域: cs.LG
Guided by the Experts: Provable Feature Learning Dynamic of Soft-Routed Mixture-of-Experts
Mixture-of-Experts (MoE) architectures have emerged as a cornerstone of modern AI systems. In particular, MoEs route inputs dynamically to specialized experts whose outputs are aggregated through weighted summation. Despite their widespread application, theoretical understanding of MoE training dynamics remains limited to either separate expert-router optimization or only top-1 routing scenarios with carefully constructed datasets. This paper advances MoE theory by providing convergence guarantees for joint training of soft-routed MoE models with non-linear routers and experts in a student-teacher framework. We prove that, with moderate over-parameterization, the student network undergoes a feature learning phase, where the router's learning process is ``guided'' by the experts, that recovers the teacher's parameters. Moreover, we show that a post-training pruning can effectively eliminate redundant neurons, followed by a provably convergent fine-tuning process that reaches global optimality. To our knowledge, our analysis is the first to bring novel insights in understanding the optimization landscape of the MoE architecture.
Updated: 2025-10-08 16:40:31
标题: 由专家指导:可证明的软路由混合专家特征学习动态
摘要: Mixture-of-Experts(MoE)结构已经成为现代人工智能系统的基石。特别是,MoEs动态地将输入路由到专门的专家,这些专家的输出通过加权求和进行聚合。尽管它们被广泛应用,但对MoE训练动态的理论理解仍然局限于单独的专家路由器优化或仅针对精心构建的数据集的仅 top-1 路由场景。本文通过提供 soft-routed MoE 模型的非线性路由器和专家在学生-教师框架下的联合训练的收敛保证,推进了MoE理论。我们证明,在适度的过参数化下,学生网络经历了一个特征学习阶段,其中路由器的学习过程受到专家的“指导”,从而恢复了教师的参数。此外,我们展示了后训练修剪可以有效消除冗余神经元,随后进行可证明收敛的微调过程,达到全局最优。据我们所知,我们的分析是第一个在理解MoE架构的优化景观方面带来新见解的研究。
更新时间: 2025-10-08 16:40:31
领域: cs.LG,math.OC
An in-depth look at approximation via deep and narrow neural networks
In 2017, Hanin and Sellke showed that the class of arbitrarily deep, real-valued, feed-forward and ReLU-activated networks of width w forms a dense subset of the space of continuous functions on R^n, with respect to the topology of uniform convergence on compact sets, if and only if w>n holds. To show the necessity, a concrete counterexample function f:R^n->R was used. In this note we actually approximate this very f by neural networks in the two cases w=n and w=n+1 around the aforementioned threshold. We study how the approximation quality behaves if we vary the depth and what effect (spoiler alert: dying neurons) cause that behavior.
Updated: 2025-10-08 16:34:45
标题: 深而窄神经网络逼近的深入探讨
摘要: 2017年,Hanin和Sellke表明,具有任意深度、实值、前馈和ReLU激活网络的宽度w在R^n上形成一个稠密子集,关于紧致集上的一致收敛拓扑,当且仅当w>n时成立。为了证明必要性,使用了一个具体的反例函数f:R^n->R。在本文中,我们实际上通过神经网络在w=n和w=n+1这两种情况下近似这个f函数,围绕上述阈值。我们研究了如果改变深度,近似质量的变化以及导致这种行为的影响(提示:神经元死亡)。
更新时间: 2025-10-08 16:34:45
领域: cs.LG,68T07, 41A30,I.2.6; I.5.1
Last-iterate Convergence for Symmetric, General-sum, $2 \times 2$ Games Under The Exponential Weights Dynamic
We conduct a comprehensive analysis of the discrete-time exponential-weights dynamic with a constant step size on all \emph{general-sum and symmetric} $2 \times 2$ normal-form games, i.e. games with $2$ pure strategies per player, and where the ensuing payoff tuple is of the form $(A,A^\top)$ (where $A$ is the $2 \times 2$ payoff matrix corresponding to the first player). Such symmetric games commonly arise in real-world interactions between "symmetric" agents who have identically defined utility functions -- such as Bertrand competition, multi-agent performative prediction, and certain congestion games -- and display a rich multiplicity of equilibria despite the seemingly simple setting. Somewhat surprisingly, we show through a first-principles analysis that the exponential weights dynamic, which is popular in online learning, converges in the last iterate for such games regardless of initialization with an appropriately chosen step size. For certain games and/or initializations, we further show that the convergence rate is in fact exponential and holds for any step size. We illustrate our theory with extensive simulations and applications to the aforementioned game-theoretic interactions. In the case of multi-agent performative prediction, we formulate a new "mortgage competition" game between lenders (i.e. banks) who interact with a population of customers, and show that it fits into our framework.
Updated: 2025-10-08 16:32:58
标题: 对称、总和广义、$2 \times 2$博弈在指数权重动态下的最后迭代收敛
摘要: 我们对具有恒定步长的离散时间指数权重动态在所有\emph{一般和对称}的$2\times 2$正常形式博弈进行了全面分析,即每个玩家有$2$个纯策略的博弈,博弈的结果支付元组形式为$(A,A^\top)$(其中$A$是对应于第一个玩家的$2\times 2$支付矩阵)。这种对称博弈在现实世界中的“对称”代理之间的交互中经常出现,这些代理具有完全相同定义的效用函数,例如伯特兰竞争、多智能体表现性预测和某些拥堵博弈,尽管表面上看起来是简单设置,但展示了丰富的均衡多样性。令人惊讶的是,我们通过第一原理分析表明,指数权重动态,在在线学习中很受欢迎,在这种博弈中,无论初始化如何选择,都会在最后迭代中收敛,只需选择适当的步长。对于某些博弈和/或初始化,我们进一步表明,收敛速度实际上是指数级的,并且适用于任何步长。 我们通过广泛的模拟和应用展示了我们的理论在上述博弈论交互中的应用。在多智能体表现性预测的情况下,我们制定了一个新的“抵押贷款竞争”游戏,该游戏涉及与客户群互动的贷款人(即银行),并且表明它符合我们的框架。
更新时间: 2025-10-08 16:32:58
领域: cs.GT,cs.LG
On Univariate Sumcheck
Two candidate approaches for univariate sumcheck over roots of unity are presented. The first takes the form of a multilinear evaluation protocol, which can be combined with the standard multivariate sumcheck protocol. The other consists of a direct reduction from univariate sumcheck to multilinear evaluation, which can be combined with Gemini (Bootle et al., Eurocrypt 2022). Both approaches optionally support a very natural exponential round reduction from $m$ to $\log(m)$ while retaining asymptotically optimal linear prover time.
Updated: 2025-10-08 16:29:18
标题: 关于单变量Sumcheck
摘要: 本文介绍了两种针对单位根上的一元求和检查的候选方法。第一种方法采用了多线性评估协议的形式,可以与标准的多元求和检查协议结合使用。另一种方法是将一元求和检查直接降级为多线性评估,可以与Gemini(Bootle等人,Eurocrypt 2022)结合使用。这两种方法都可以选择性地支持将$m$轮减少到$\log(m)$,同时保持渐近最优的线性证明时间。
更新时间: 2025-10-08 16:29:18
领域: cs.CR
Accelerating Inference for Multilayer Neural Networks with Quantum Computers
Fault-tolerant Quantum Processing Units (QPUs) promise to deliver exponential speed-ups in select computational tasks, yet their integration into modern deep learning pipelines remains unclear. In this work, we take a step towards bridging this gap by presenting the first fully-coherent quantum implementation of a multilayer neural network with non-linear activation functions. Our constructions mirror widely used deep learning architectures based on ResNet, and consist of residual blocks with multi-filter 2D convolutions, sigmoid activations, skip-connections, and layer normalizations. We analyse the complexity of inference for networks under three quantum data access regimes. Without any assumptions, we establish a quadratic speedup over classical methods for shallow bilinear-style networks. With efficient quantum access to the weights, we obtain a quartic speedup over classical methods. With efficient quantum access to both the inputs and the network weights, we prove that a network with an $N$-dimensional vectorized input, $k$ residual block layers, and a final residual-linear-pooling layer can be implemented with an error of $\epsilon$ with $O(\text{polylog}(N/\epsilon)^k)$ inference cost.
Updated: 2025-10-08 16:26:50
标题: 使用量子计算机加速多层神经网络的推断
摘要: 容错量子处理单元(QPUs)承诺在选择性计算任务中实现指数级加速,然而它们如何整合到现代深度学习流程中尚不清楚。在这项工作中,我们通过展示第一个具有非线性激活函数的多层神经网络的完全连贯量子实现,迈出了缩小这一差距的一步。我们的构建镜像了基于ResNet的广泛使用的深度学习架构,包括具有多滤波器2D卷积、Sigmoid激活、跳跃连接和层标准化的残差块。我们分析了在三种量子数据访问制度下网络推理的复杂性。在没有任何假设的情况下,我们建立了对于浅双线性风格网络的经典方法的二次加速。通过高效访问权重,我们获得了对于经典方法的四次加速。通过高效访问输入和网络权重,我们证明了具有$N$维向量化输入、$k$个残差块层和最终残差线性池层的网络可以在$O(\text{polylog}(N/\epsilon)^k)$的推理成本下实现误差为$\epsilon$。
更新时间: 2025-10-08 16:26:50
领域: quant-ph,cs.LG
Covert Quantum Learning: Privately and Verifiably Learning from Quantum Data
Quantum learning from remotely accessed quantum compute and data must address two key challenges: verifying the correctness of data and ensuring the privacy of the learner's data-collection strategies and resulting conclusions. The covert (verifiable) learning model of Canetti and Karchmer (TCC 2021) provides a framework for endowing classical learning algorithms with such guarantees. In this work, we propose models of covert verifiable learning in quantum learning theory and realize them without computational hardness assumptions for remote data access scenarios motivated by established quantum data advantages. We consider two privacy notions: (i) strategy-covertness, where the eavesdropper does not gain information about the learner's strategy; and (ii) target-covertness, where the eavesdropper does not gain information about the unknown object being learned. We show: Strategy-covert algorithms for making quantum statistical queries via classical shadows; Target-covert algorithms for learning quadratic functions from public quantum examples and private quantum statistical queries, for Pauli shadow tomography and stabilizer state learning from public multi-copy and private single-copy quantum measurements, and for solving Forrelation and Simon's problem from public quantum queries and private classical queries, where the adversary is a unidirectional or i.i.d. ancilla-free eavesdropper. The lattermost results in particular establish that the exponential separation between classical and quantum queries for Forrelation and Simon's problem survives under covertness constraints. Along the way, we design covert verifiable protocols for quantum data acquisition from public quantum queries which may be of independent interest. Overall, our models and corresponding algorithms demonstrate that quantum advantages are privately and verifiably achievable even with untrusted, remote data.
Updated: 2025-10-08 16:25:28
标题: 隐秘量子学习:从量子数据中私密并可验证地学习
摘要: 从远程访问的量子计算和数据中学习必须解决两个关键挑战:验证数据的正确性以及确保学习者数据收集策略和结果结论的隐私性。Canetti和Karchmer的隐蔽(可验证)学习模型(TCC 2021)为经典学习算法提供了这些保证的框架。在这项工作中,我们提出了量子学习理论中的隐蔽可验证学习模型,并在受成熟的量子数据优势启发的远程数据访问场景中实现了这些模型,而不需要计算困难假设。我们考虑了两种隐私概念:(i)策略隐蔽,其中窃听者不会获得有关学习者策略的信息;(ii)目标隐蔽,其中窃听者不会获得有关正在学习的未知对象的信息。我们展示了:通过经典阴影进行量子统计查询的策略隐蔽算法;从公共量子示例和私人量子统计查询中学习二次函数的目标隐蔽算法,以及从公共多份副本和私人单份副本量子测量中学习Pauli阴影层析术和稳定态学习的目标隐蔽算法,以及通过公共量子查询和私人经典查询解决Forrelation和Simon问题的目标隐蔽算法,其中对手是单向或i.i.d.无辅助窃听者。尤其是最后一项结果表明,在隐蔽约束下,对于Forrelation和Simon问题,经典查询和量子查询之间的指数差异仍然存在。在此过程中,我们设计了用于从公共量子查询中获取量子数据的隐蔽可验证协议,这可能具有独立的兴趣。总的来说,我们的模型和相应的算法表明,即使在不受信任的远程数据情况下,量子优势也是可以私下和可验证地实现的。
更新时间: 2025-10-08 16:25:28
领域: quant-ph,cs.CR,cs.LG
Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples
Poisoning attacks can compromise the safety of large language models (LLMs) by injecting malicious documents into their training data. Existing work has studied pretraining poisoning assuming adversaries control a percentage of the training corpus. However, for large models, even small percentages translate to impractically large amounts of data. This work demonstrates for the first time that poisoning attacks instead require a near-constant number of documents regardless of dataset size. We conduct the largest pretraining poisoning experiments to date, pretraining models from 600M to 13B parameters on chinchilla-optimal datasets (6B to 260B tokens). We find that 250 poisoned documents similarly compromise models across all model and dataset sizes, despite the largest models training on more than 20 times more clean data. We also run smaller-scale experiments to ablate factors that could influence attack success, including broader ratios of poisoned to clean data and non-random distributions of poisoned samples. Finally, we demonstrate the same dynamics for poisoning during fine-tuning. Altogether, our results suggest that injecting backdoors through data poisoning may be easier for large models than previously believed as the number of poisons required does not scale up with model size, highlighting the need for more research on defences to mitigate this risk in future models.
Updated: 2025-10-08 16:25:05
标题: LLM毒化攻击需要近恒定数量的毒样
摘要: 中毒攻击可能会通过向大型语言模型(LLMs)的训练数据中注入恶意文档来危害安全。现有研究已经研究了以对手控制训练语料库的百分比为前提的预训练中毒。然而,对于大型模型而言,即使是小百分比也会转化为实际上大量的数据。这项工作首次证明,中毒攻击实际上需要一个接近恒定数量的文档,而不管数据集的大小。我们进行了迄今为止规模最大的预训练中毒实验,在毛丝鼠优化数据集上(6B到260B个标记)对从600M到13B个参数的模型进行预训练。我们发现,250个中毒文档同样会危害所有模型和数据集大小,尽管最大的模型训练的干净数据量超过20倍。我们还进行了规模较小的实验,以消除可能影响攻击成功的因素,包括中毒数据与干净数据的更广泛比例和中毒样本的非随机分布。最后,我们证明了在微调过程中进行中毒的相同动态。总的来说,我们的结果表明,通过数据中毒注入后门可能对大型模型来说比之前认为的更容易,因为所需的毒剂数量不随模型大小增加而增加,这凸显了需要更多研究来减轻未来模型中这种风险的防御措施的必要性。
更新时间: 2025-10-08 16:25:05
领域: cs.LG
Resolution scaling governs DINOv3 transfer performance in chest radiograph classification
Self-supervised learning (SSL) has advanced visual representation learning, but its value in chest radiography, a high-volume imaging modality with fine-grained findings, remains unclear. Meta's DINOv3 extends earlier SSL models through Gram-anchored self-distillation. Whether these design choices improve transfer learning for chest radiography has not been systematically tested. We benchmarked DINOv3 against DINOv2 and ImageNet initialization across seven datasets (n>814,000). Two representative backbones were evaluated: ViT-B/16 and ConvNeXt-B. Images were analyzed at 224x224, 512x512, and 1024x1024 pixels. We additionally assessed frozen features from a 7B model. The primary outcome was mean AUROC across labels. At 224x224, DINOv3 and DINOv2 achieved comparable performance on adult datasets. Increasing resolution to 512x512 yielded consistent improvements for DINOv3 over both DINOv2 and ImageNet. In contrast, results in pediatric cohort showed no differences across initializations. Across all settings, ConvNeXt-B outperformed ViT-B/16. Models using frozen DINOv3-7B features underperformed relative to fully finetuned 86-89M-parameter backbones, highlighting the importance of domain adaptation. Scaling to 1024x1024 did not further improve accuracy. Resolution-related gains were most evident for boundary-dependent and small focal abnormalities. In chest radiography, higher input resolution is critical for leveraging the benefits of modern self-supervised models. 512x512 pixels represent a practical upper limit where DINOv3-initialized ConvNeXt-B networks provide the strongest performance, while larger inputs offer minimal return on cost. Clinically, these findings support use of finetuned, mid-sized backbones at 512x512 for chest radiograph interpretation, with the greatest gains expected in detecting subtle or boundary-centered lesions relevant to emergency and critical care settings.
Updated: 2025-10-08 16:25:04
标题: 分辨率缩放决定DINOv3在胸部X光分类中的迁移性能
摘要: 自监督学习(SSL)已经推动了视觉表示学习的发展,但其在胸部放射学领域的价值仍不清楚,这是一个具有细粒度结果的高容量成像模态。Meta的DINOv3通过Gram锚定的自蒸馏扩展了早期的SSL模型。这些设计选择是否改善了胸部放射学的迁移学习尚未经过系统测试。我们在七个数据集(n > 814,000)上对DINOv3与DINOv2和ImageNet初始化进行了基准测试。评估了两个代表性的骨干:ViT-B/16和ConvNeXt-B。图像在224x224、512x512和1024x1024像素下进行分析。我们还评估了来自7B模型的冻结特征。主要结果是跨标签的平均AUROC。在224x224下,DINOv3和DINOv2在成人数据集上的表现相当。将分辨率提高到512x512可以使DINOv3相对于DINOv2和ImageNet都有持续改进。相比之下,儿童队列中的结果显示出初始化没有差异。在所有设置中,ConvNeXt-B胜过ViT-B/16。使用冻结的DINOv3-7B特性的模型相对于完全微调的86-89M参数骨干表现不佳,突出了领域适应的重要性。将分辨率提高到1024x1024并没有进一步提高准确性。与边界相关和小焦点异常相关的分辨率增益最为明显。在胸部放射学中,更高的输入分辨率对于利用现代自监督模型的好处至关重要。512x512像素代表了DINOv3初始化的ConvNeXt-B网络提供最强大的性能的实用上限,而更大的输入只提供了很少的成本回报。临床上,这些发现支持在512x512的胸部X光解释中使用微调的中等大小的骨干,预计在检测与急救和重症监护环境相关的细微或边界中心病变时将获得最大的收益。
更新时间: 2025-10-08 16:25:04
领域: cs.CV,cs.AI,cs.LG
From Injection to Defense: Constructing Edit-Based Fingerprints for Large Language Models
Fingerprinting is critical for maintaining traceability and protecting the intellectual property (IP) of developers, as LLMs deployed in web applications are susceptible to unauthorized redistribution and misuse via fine-tuning or black-box deployment. However, current backdoor-based fingerprinting methods face a fundamental trade-off: fingerprints embedded as garbled text are easily detected and filtered, whereas those crafted as coherent natural language are prone to being triggered unintentionally. To overcome these limitations, we propose RFEdit, a knowledge-editing framework that embeds a rule-based multilingual natural language fingerprint (MNLF) by modifying a sparse subset of model weights. This approach enables efficient and robust fingerprint injection with minimal impact on unrelated knowledge in LLMs. Our RFEdit framework is further safeguarded by Fingerprint Subspace-aware Fine-Tuning (FSFT), which mitigates fingerprint degradation during legitimate fine-tuning by restricting parameter updates to the fingerprint subspace. This approach preserves fingerprint integrity while enhancing downstream task performance of LLMs. These advances establish a comprehensive pipeline from fingerprint injection to defense, achieving high detection effectiveness, robustness against adversarial manipulations, harmlessness to model utility, and persistence under fine-tuning. Extensive experiments demonstrate that RFEdit maintains robustness under quantization and pruning. Additionally, fingerprint effectiveness is generally improved by more than 10\% when combined with FSFT for math and alpaca downstream tasks.
Updated: 2025-10-08 16:23:32
标题: 从注射到防御:为大型语言模型构建基于编辑的指纹
摘要: 指纹技术对于维护可追溯性和保护开发者的知识产权(IP)至关重要,因为部署在Web应用程序中的LLMs容易受到未经授权的重新分发和滥用,通过微调或黑盒部署。然而,当前基于后门的指纹识别方法面临一个根本性的折衷:嵌入为混乱文本的指纹容易被检测和过滤,而那些精心制作为连贯自然语言的指纹则容易被意外触发。为了克服这些限制,我们提出了RFEdit,一个知识编辑框架,通过修改模型权重的稀疏子集来嵌入基于规则的多语言自然语言指纹(MNLF)。这种方法能够在LLMs中有效且稳健地注入指纹,对不相关知识的影响最小。我们的RFEdit框架进一步通过指纹子空间感知微调(FSFT)进行保护,通过限制参数更新到指纹子空间来减轻在合法微调过程中指纹退化的影响。这种方法在提升LLMs下游任务性能的同时保持指纹完整性。这些进展建立了从指纹注入到防御的全面流水线,实现高检测效果、抗对抗操纵、对模型效用无害以及在微调过程中持久性。大量实验证明,RFEdit在量化和修剪下保持稳健性。此外,与FSFT结合应用于数学和羊驼下游任务时,指纹效果通常提高超过10%。
更新时间: 2025-10-08 16:23:32
领域: cs.CL,cs.AI,cs.LG
Split Conformal Classification with Unsupervised Calibration
Methods for split conformal prediction leverage calibration samples to transform any prediction rule into a set-prediction rule that complies with a target coverage probability. Existing methods provide remarkably strong performance guarantees with minimal computational costs. However, they require to use calibration samples composed by labeled examples different to those used for training. This requirement can be highly inconvenient, as it prevents the use of all labeled examples for training and may require acquiring additional labels solely for calibration. This paper presents an effective methodology for split conformal prediction with unsupervised calibration for classification tasks. In the proposed approach, set-prediction rules are obtained using unsupervised calibration samples together with supervised training samples previously used to learn the classification rule. Theoretical and experimental results show that the presented methods can achieve performance comparable to that with supervised calibration, at the expenses of a moderate degradation in performance guarantees and computational efficiency.
Updated: 2025-10-08 16:22:41
标题: 无监督校准的分裂共形分类
摘要: 拆分一致性预测的方法利用校准样本将任何预测规则转化为一个符合目标覆盖概率的集合预测规则。现有方法在计算成本方面提供了非常强大的性能保证。然而,它们需要使用由带标签的示例组成的校准样本,这些示例与用于训练的示例不同。这个要求可能非常不方便,因为它阻止了将所有带标签的示例用于训练,并可能需要单独获取额外的标签用于校准。本文提出了一种有效的方法,用于分类任务的无监督校准的拆分一致性预测。在所提出的方法中,集合预测规则是使用无监督校准样本与之前用于学习分类规则的监督训练样本一起获得的。理论和实验结果表明,所提出的方法可以在性能保证和计算效率上达到与监督校准相媲美的表现,但会付出一定的性能保证和计算效率的适度降低。
更新时间: 2025-10-08 16:22:41
领域: stat.ML,cs.LG
Bridged Clustering for Representation Learning: Semi-Supervised Sparse Bridging
We introduce Bridged Clustering, a semi-supervised framework to learn predictors from any unpaired input $X$ and output $Y$ dataset. Our method first clusters $X$ and $Y$ independently, then learns a sparse, interpretable bridge between clusters using only a few paired examples. At inference, a new input $x$ is assigned to its nearest input cluster, and the centroid of the linked output cluster is returned as the prediction $\hat{y}$. Unlike traditional SSL, Bridged Clustering explicitly leverages output-only data, and unlike dense transport-based methods, it maintains a sparse and interpretable alignment. Through theoretical analysis, we show that with bounded mis-clustering and mis-bridging rates, our algorithm becomes an effective and efficient predictor. Empirically, our method is competitive with SOTA methods while remaining simple, model-agnostic, and highly label-efficient in low-supervision settings.
Updated: 2025-10-08 16:20:49
标题: 桥式聚类用于表示学习:半监督稀疏桥接
摘要: 我们介绍了桥接聚类(Bridged Clustering),这是一个半监督框架,可以从任意未配对的输入$X$和输出$Y$数据集中学习预测器。我们的方法首先独立地对$X$和$Y$进行聚类,然后只使用少量配对示例学习两个聚类之间的稀疏、可解释的桥接。在推断阶段,新的输入$x$被分配到其最近的输入聚类,然后返回与之链接的输出聚类的中心作为预测$\hat{y}$。与传统的半监督学习方法不同,桥接聚类明确利用了仅有输出数据,而不同于密集的基于传输的方法,它保持了一个稀疏且可解释的对齐。通过理论分析,我们表明在有界的错误聚类和错误桥接率下,我们的算法成为一种有效且高效的预测器。在实证研究中,我们的方法在保持简单、与模型无关且在低监督环境中高效利用标签的情况下,与现有技术水平方法竞争力强。
更新时间: 2025-10-08 16:20:49
领域: cs.LG
Bayesian Portfolio Optimization by Predictive Synthesis
Portfolio optimization is a critical task in investment. Most existing portfolio optimization methods require information on the distribution of returns of the assets that make up the portfolio. However, such distribution information is usually unknown to investors. Various methods have been proposed to estimate distribution information, but their accuracy greatly depends on the uncertainty of the financial markets. Due to this uncertainty, a model that could well predict the distribution information at one point in time may perform less accurately compared to another model at a different time. To solve this problem, we investigate a method for portfolio optimization based on Bayesian predictive synthesis (BPS), one of the Bayesian ensemble methods for meta-learning. We assume that investors have access to multiple asset return prediction models. By using BPS with dynamic linear models to combine these predictions, we can obtain a Bayesian predictive posterior about the mean rewards of assets that accommodate the uncertainty of the financial markets. In this study, we examine how to construct mean-variance portfolios and quantile-based portfolios based on the predicted distribution information.
Updated: 2025-10-08 16:18:11
标题: 贝叶斯组合优化通过预测综合
摘要: 投资组合优化是投资中的关键任务。大多数现有的投资组合优化方法需要有关构成投资组合的资产收益分布的信息。然而,这种分布信息通常对投资者来说是未知的。已经提出了各种方法来估计分布信息,但它们的准确性在很大程度上取决于金融市场的不确定性。由于这种不确定性,一个能够在某一时间点很好地预测分布信息的模型,可能在不同时间点与另一个模型相比表现不够准确。为了解决这个问题,我们研究了一种基于贝叶斯预测综合(BPS)的投资组合优化方法,这是用于元学习的贝叶斯集成方法之一。我们假设投资者可以访问多个资产收益预测模型。通过使用动态线性模型将这些预测结合起来,我们可以获得一个关于资产平均回报的贝叶斯预测后验,以适应金融市场的不确定性。在本研究中,我们研究了如何基于预测的分布信息构建均值-方差投资组合和基于分位数的投资组合。
更新时间: 2025-10-08 16:18:11
领域: econ.EM,cs.LG,q-fin.CP,q-fin.PM,stat.AP
Exposing LLM User Privacy via Traffic Fingerprint Analysis: A Study of Privacy Risks in LLM Agent Interactions
Large Language Models (LLMs) are increasingly deployed as agents that orchestrate tasks and integrate external tools to execute complex workflows. We demonstrate that these interactive behaviors leave distinctive fingerprints in encrypted traffic exchanged between users and LLM agents. By analyzing traffic patterns associated with agent workflows and tool invocations, adversaries can infer agent activities, distinguish specific agents, and even profile sensitive user attributes. To highlight this risk, we develop AgentPrint, which achieves an F1-score of 0.866 in agent identification and attains 73.9% and 69.1% top-3 accuracy in user attribute inference for simulated- and real-user settings, respectively. These results uncover an overlooked risk: the very interactivity that empowers LLM agents also exposes user privacy, underscoring the urgent need for technical countermeasures alongside regulatory and policy safeguards.
Updated: 2025-10-08 16:16:23
标题: 通过流量指纹分析揭示LLM用户隐私:LLM代理互动中隐私风险的研究
摘要: 大型语言模型(LLMs)越来越多地被部署为代理,用于编排任务并集成外部工具来执行复杂的工作流程。我们证明这些交互行为在用户和LLM代理之间交换的加密流量中留下了独特的指纹。通过分析与代理工作流程和工具调用相关的流量模式,对手可以推断代理活动,区分特定代理,并甚至对敏感用户属性进行个人资料化。为了凸显这一风险,我们开发了AgentPrint,该系统在代理识别方面实现了0.866的F1分数,并分别在模拟用户和真实用户设置中实现了73.9%和69.1%的前三准确度。这些结果揭示了一个被忽视的风险:赋予LLM代理力量的互动性也暴露了用户的隐私,强调了技术对策在监管和政策保障之外的紧急需要。
更新时间: 2025-10-08 16:16:23
领域: cs.CR
Quantifying Data Contamination in Psychometric Evaluations of LLMs
Recent studies apply psychometric questionnaires to Large Language Models (LLMs) to assess high-level psychological constructs such as values, personality, moral foundations, and dark traits. Although prior work has raised concerns about possible data contamination from psychometric inventories, which may threaten the reliability of such evaluations, there has been no systematic attempt to quantify the extent of this contamination. To address this gap, we propose a framework to systematically measure data contamination in psychometric evaluations of LLMs, evaluating three aspects: (1) item memorization, (2) evaluation memorization, and (3) target score matching. Applying this framework to 21 models from major families and four widely used psychometric inventories, we provide evidence that popular inventories such as the Big Five Inventory (BFI-44) and Portrait Values Questionnaire (PVQ-40) exhibit strong contamination, where models not only memorize items but can also adjust their responses to achieve specific target scores.
Updated: 2025-10-08 16:16:20
标题: 量化LLM心理测量评估中的数据污染
摘要: 最近的研究将心理测量问卷应用于大型语言模型(LLMs)以评估高级心理构建,如价值观、个性、道德基础和黑暗特质。尽管先前的工作已经提出了关于可能来自心理测量问卷的数据污染的担忧,这可能威胁到这种评估的可靠性,但尚未有系统性的尝试来量化这种污染的程度。为了填补这一空白,我们提出了一个框架来系统地衡量LLMs心理测量评估中的数据污染,评估三个方面:(1)项目记忆,(2)评估记忆和(3)目标分数匹配。将此框架应用于来自主要家族的21个模型和四个广泛使用的心理测量问卷,我们提供证据表明,像大五人格问卷(BFI-44)和画像价值观问卷(PVQ-40)这样的流行问卷存在严重污染,模型不仅记住项目,还可以调整他们的回应以实现特定的目标分数。
更新时间: 2025-10-08 16:16:20
领域: cs.CL,cs.LG
NurseLLM: The First Specialized Language Model for Nursing
Recent advancements in large language models (LLMs) have significantly transformed medical systems. However, their potential within specialized domains such as nursing remains largely underexplored. In this work, we introduce NurseLLM, the first nursing-specialized LLM tailored for multiple choice question-answering (MCQ) tasks. We develop a multi-stage data generation pipeline to build the first large scale nursing MCQ dataset to train LLMs on a broad spectrum of nursing topics. We further introduce multiple nursing benchmarks to enable rigorous evaluation. Our extensive experiments demonstrate that NurseLLM outperforms SoTA general-purpose and medical-specialized LLMs of comparable size on different benchmarks, underscoring the importance of a specialized LLM for the nursing domain. Finally, we explore the role of reasoning and multi-agent collaboration systems in nursing, highlighting their promise for future research and applications.
Updated: 2025-10-08 16:15:06
标题: NurseLLM:护理专用语言模型的首次应用
摘要: 最近对大型语言模型(LLMs)的进展显著改变了医疗系统。然而,它们在护理等专业领域的潜力仍然大部分被忽视。在这项工作中,我们介绍了NurseLLM,这是第一个专为多项选择题答题(MCQ)任务量身定制的护理专业LLM。我们开发了一个多阶段数据生成管道,建立了第一个大规模护理MCQ数据集,以训练LLMs涵盖广泛的护理主题。我们进一步引入了多个护理基准,以进行严格评估。我们的广泛实验表明,NurseLLM在不同基准上表现优于同等规模的通用用途和医疗专业LLMs,强调了护理领域专门LLM的重要性。最后,我们探讨了推理和多智能体协作系统在护理中的作用,突出它们对未来研究和应用的潜力。
更新时间: 2025-10-08 16:15:06
领域: cs.CL,cs.LG
A multi-layered embedded intrusion detection framework for programmable logic controllers
Industrial control system (ICS) operations use trusted endpoints like human machine interfaces (HMIs) and workstations to relay commands to programmable logic controllers (PLCs). Because most PLCs lack layered defenses, compromise of a trusted endpoint can drive unsafe actuator commands and risk safety-critical operation. This research presents an embedded intrusion detection system that runs inside the controller and uses header-level telemetry to detect and respond to network attacks. The system combines a semi-supervised anomaly detector and a supervised attack classifier. We evaluate the approach on a midstream oil-terminal testbed using three datasets collected during tanker-truck loading. The anomaly detector achieves zero missed attacks, corresponding to 0.998 Matthews correlation. The supervised stage attains 97.37 percent hold-out accuracy and 97.03 percent external accuracy. The embedded design adds a median of 2,031 microseconds of end-to-end latency and does not impact PLC's cycle time. The proposed architecture provides a multi-layer embedded security that meets the real-time requirements of an industrial system.
Updated: 2025-10-08 16:12:02
标题: 一个多层嵌入式可编程逻辑控制器入侵检测框架
摘要: 工业控制系统(ICS)操作使用可信的终端,如人机界面(HMIs)和工作站,将命令传递给可编程逻辑控制器(PLCs)。由于大多数PLC缺乏分层防御,对可信终端的 compromise 可以驱动不安全的执行器命令,从而危及安全关键操作。本研究提出了一种嵌入式入侵检测系统,该系统在控制器内部运行,并使用头部级遥测来检测和响应网络攻击。该系统结合了半监督异常检测器和监督攻击分类器。我们在一个中游油码头测试平台上评估了该方法,使用在油罐车装载期间收集的三个数据集。异常检测器实现了零漏报攻击,相应的 Matthews 相关系数为0.998。监督阶段实现了97.37%的留出准确率和97.03%的外部准确率。嵌入式设计增加了2,031微秒的端到端延迟,不影响PLC的周期时间。所提出的架构提供了一种满足工业系统实时要求的多层嵌入式安全性。
更新时间: 2025-10-08 16:12:02
领域: cs.CR
Autonomy-Aware Clustering: When Local Decisions Supersede Global Prescriptions
Clustering arises in a wide range of problem formulations, yet most existing approaches assume that the entities under clustering are passive and strictly conform to their assigned groups. In reality, entities often exhibit local autonomy, overriding prescribed associations in ways not fully captured by feature representations. Such autonomy can substantially reshape clustering outcomes -- altering cluster compositions, geometry, and cardinality -- with significant downstream effects on inference and decision-making. We introduce autonomy-aware clustering, a reinforcement learning (RL) framework that learns and accounts for the influence of local autonomy without requiring prior knowledge of its form. Our approach integrates RL with a Deterministic Annealing (DA) procedure, where, to determine underlying clusters, DA naturally promotes exploration in early stages of annealing and transitions to exploitation later. We also show that the annealing procedure exhibits phase transitions that enable design of efficient annealing schedules. To further enhance adaptability, we propose the Adaptive Distance Estimation Network (ADEN), a transformer-based attention model that learns dependencies between entities and cluster representatives within the RL loop, accommodates variable-sized inputs and outputs, and enables knowledge transfer across diverse problem instances. Empirical results show that our framework closely aligns with underlying data dynamics: even without explicit autonomy models, it achieves solutions close to the ground truth (gap ~3-4%), whereas ignoring autonomy leads to substantially larger gaps (~35-40%). The code and data are publicly available at https://github.com/salar96/AutonomyAwareClustering.
Updated: 2025-10-08 16:05:52
标题: 自治感知聚类:当本地决策超越全局规定时
摘要: 聚类在各种问题表述中广泛出现,然而大多数现有方法都假设在聚类中的实体是被动的,并且严格服从于其分配的组。然而在现实中,实体往往表现出局部自治性,以不完全被特征表示捕捉到的方式覆盖规定的关联。这种自治性可以显著地重塑聚类结果--改变聚类的组成、几何形状和基数--对推断和决策制定产生重大的下游影响。我们引入了一种自主感知聚类的强化学习(RL)框架,该框架学习和考虑局部自治性的影响,而不需要先验知识其形式。我们的方法将RL与确定性退火(DA)过程相结合,DA自然地在退火的早期阶段促进探索,并在后期过渡到开发。我们还展示了退火过程具有相变,从而能够设计出高效的退火计划。为了进一步增强适应性,我们提出了自适应距离估计网络(ADEN),这是一种基于变压器的注意力模型,它在RL循环中学习实体和聚类代表之间的依赖关系,容纳可变大小的输入和输出,并实现跨不同问题实例的知识转移。实证结果表明我们的框架与底层数据动态密切相关:即使没有明确的自治模型,它也可以实现接近地面真相的解决方案(差距约为3-4%),而忽视自治性会导致更大的差距(约为35-40%)。代码和数据可在https://github.com/salar96/AutonomyAwareClustering 上公开获取。
更新时间: 2025-10-08 16:05:52
领域: cs.LG,cs.AI
On Task Vectors and Gradients
Task arithmetic has emerged as a simple yet powerful technique for model merging, enabling the combination of multiple finetuned models into one. Despite its empirical success, a clear theoretical explanation of why and when it works is lacking. This paper provides a rigorous theoretical foundation for task arithmetic by establishing a connection between task vectors and gradients of the task losses. We show that under standard gradient descent, a task vector generated from one epoch of finetuning is exactly equivalent to the negative gradient of the loss, scaled by the learning rate. For the practical multi-epoch setting, we prove that this equivalence holds approximately, with a second-order error term that we explicitly bound for feed-forward networks. Our empirical analysis across seven vision benchmarks corroborates our theory, demonstrating that the first-epoch gradient dominates the finetuning trajectory in both norm and direction. A key implication is that merging models finetuned for only a single epoch often yields performance comparable to merging fully converged models. These findings reframe task arithmetic as a form of approximate multitask learning, providing a clear rationale for its effectiveness and highlighting the critical role of early training dynamics in model merging.
Updated: 2025-10-08 16:00:50
标题: 关于任务向量和梯度
摘要: 任务算术已经成为一种简单但强大的技术,用于模型合并,使多个微调模型合并为一个。尽管在经验上取得了成功,但为什么以及何时有效的清晰的理论解释尚未提供。本文通过建立任务向量和任务损失的梯度之间的联系,为任务算术提供了严格的理论基础。我们展示,在标准梯度下降的情况下,从一次微调生成的任务向量恰好等同于损失的负梯度,乘以学习率。对于实际的多次迭代设置,我们证明了这种等价性近似成立,具有一个明确界定的二阶误差项,适用于前馈网络。我们在七个视觉基准上的实证分析支持了我们的理论,表明第一次迭代的梯度在微调轨迹中在范数和方向上占主导地位。一个关键的含义是,将仅进行了单次迭代微调的模型合并通常会产生与合并完全收敛的模型相媲美的性能。这些发现重新构想了任务算术作为一种近似多任务学习的形式,为其有效性提供了明确的理由,并突出了早期训练动态在模型合并中的关键作用。
更新时间: 2025-10-08 16:00:50
领域: cs.LG,cs.AI
ELMUR: External Layer Memory with Update/Rewrite for Long-Horizon RL
Real-world robotic agents must act under partial observability and long horizons, where key cues may appear long before they affect decision making. However, most modern approaches rely solely on instantaneous information, without incorporating insights from the past. Standard recurrent or transformer models struggle with retaining and leveraging long-term dependencies: context windows truncate history, while naive memory extensions fail under scale and sparsity. We propose ELMUR (External Layer Memory with Update/Rewrite), a transformer architecture with structured external memory. Each layer maintains memory embeddings, interacts with them via bidirectional cross-attention, and updates them through an Least Recently Used (LRU) memory module using replacement or convex blending. ELMUR extends effective horizons up to 100,000 times beyond the attention window and achieves a 100% success rate on a synthetic T-Maze task with corridors up to one million steps. In POPGym, it outperforms baselines on more than half of the tasks. On MIKASA-Robo sparse-reward manipulation tasks with visual observations, it nearly doubles the performance of strong baselines. These results demonstrate that structured, layer-local external memory offers a simple and scalable approach to decision making under partial observability.
Updated: 2025-10-08 15:50:34
标题: ELMUR: 长时间线强化学习的具有更新/重写功能的外部层内存
摘要: 真实世界的机器人代理必须在部分可观测性和长期规划下行动,关键线索可能出现在影响决策之前很久。然而,大多数现代方法仅依赖即时信息,没有融入过去的见解。标准的循环或转换器模型在保留和利用长期依赖性方面存在困难:上下文窗口截断历史,而朴素的内存扩展在规模和稀疏性下失败。我们提出ELMUR(带有更新/重写的外部层内存),这是一种带有结构化外部内存的转换器架构。每个层维护内存嵌入,通过双向交叉注意力与其交互,并通过最近最少使用(LRU)内存模块使用替换或凸混合更新它们。ELMUR将有效范围延伸到关注窗口之外的100,000倍,并在具有长达一百万步的走廊的合成T迷宫任务上实现了100%的成功率。在POPGym中,它在超过一半的任务上优于基线。在MIKASA-Robo稀疏奖励操纵任务中,凭借视觉观察,它将强基线的表现几乎提高了一倍。这些结果表明,结构化的、层本地的外部内存为在部分可观测性下进行决策提供了一种简单且可扩展的方法。
更新时间: 2025-10-08 15:50:34
领域: cs.LG,cs.AI,cs.RO
A Multi-Agent Framework for Stateful Inference-Time Search
Recent work explores agentic inference-time techniques to perform structured, multi-step reasoning. However, stateless inference often struggles on multi-step tasks due to the absence of persistent state. Moreover, task-specific fine-tuning or instruction-tuning often achieve surface-level code generation but remain brittle on tasks requiring deeper reasoning and long-horizon dependencies. To address these limitations, we propose stateful multi-agent evolutionary search, a training-free framework that departs from prior stateless approaches by combining (i) persistent inference-time state, (ii) adversarial mutation, and (iii) evolutionary preservation. We demonstrate its effectiveness in automated unit test generation through the generation of edge cases. We generate robust edge cases using an evolutionary search process, where specialized agents sequentially propose, mutate, and score candidates. A controller maintains persistent state across generations, while evolutionary preservation ensures diversity and exploration across all possible cases. This yields a generalist agent capable of discovering robust, high-coverage edge cases across unseen codebases. Experiments show our stateful multi-agent inference framework achieves substantial gains in coverage over stateless single-step baselines, evaluated on prevalent unit-testing benchmarks such as HumanEval and TestGenEvalMini and using three diverse LLM families - Llama, Gemma, and GPT. These results indicate that combining persistent inference-time state with evolutionary search materially improves unit-test generation.
Updated: 2025-10-08 15:48:41
标题: 一个用于有状态推理时间搜索的多智能体框架
摘要: 最近的工作探索了主观推断时间技术,用于执行结构化、多步推理。然而,由于缺乏持久状态,状态无关推断通常在多步任务上表现不佳。此外,任务特定的微调或指令微调通常可以实现表面级别的代码生成,但在需要更深层次推理和长期依赖的任务上仍然不稳定。为了解决这些限制,我们提出了一种有状态的多代理进化搜索,这是一个无需训练的框架,与先前的无状态方法不同,它结合了持久的推断时间状态、对抗性突变和进化保护。我们通过生成边缘情况来展示其在自动化单元测试生成中的有效性。我们使用进化搜索过程生成强健的边缘情况,专门的代理按顺序提出、突变和评分候选者。一个控制器在各代之间维护持久状态,而进化保护确保在所有可能情况下的多样性和探索。这产生了一个通用代理,能够发现跨未见代码库的强健、高覆盖率的边缘情况。实验表明,我们的有状态多代理推理框架在覆盖率上取得了显著的增益,评估了流行的单元测试基准,如HumanEval和TestGenEvalMini,并使用了三个不同的LLM系列 - Llama、Gemma和GPT。这些结果表明,将持久的推断时间状态与进化搜索相结合实质性地改善了单元测试生成。
更新时间: 2025-10-08 15:48:41
领域: cs.LG,cs.AI,cs.CL,cs.MA,cs.SE
Security through the Eyes of AI: How Visualization is Shaping Malware Detection
Malware, a persistent cybersecurity threat, increasingly targets interconnected digital systems such as desktop, mobile, and IoT platforms through sophisticated attack vectors. By exploiting these vulnerabilities, attackers compromise the integrity and resilience of modern digital ecosystems. To address this risk, security experts actively employ Machine Learning or Deep Learning-based strategies, integrating static, dynamic, or hybrid approaches to categorize malware instances. Despite their advantages, these methods have inherent drawbacks and malware variants persistently evolve with increased sophistication, necessitating advancements in detection strategies. Visualization-based techniques are emerging as scalable and interpretable solutions for detecting and understanding malicious behaviors across diverse platforms including desktop, mobile, IoT, and distributed systems as well as through analysis of network packet capture files. In this comprehensive survey of more than 100 high-quality research articles, we evaluate existing visualization-based approaches applied to malware detection and classification. As a first contribution, we propose a new all-encompassing framework to study the landscape of visualization-based malware detection techniques. Within this framework, we systematically analyze state-of-the-art approaches across the critical stages of the malware detection pipeline. By analyzing not only the single techniques but also how they are combined to produce the final solution, we shed light on the main challenges in visualization-based approaches and provide insights into the advancements and potential future directions in this critical field.
Updated: 2025-10-08 15:38:44
标题: 人工智能视角下的安全:可视化如何影响恶意软件检测
摘要: 恶意软件是一种持续存在的网络安全威胁,越来越多地针对互连的数字系统,如桌面、移动和物联网平台,通过复杂的攻击向量进行攻击。通过利用这些漏洞,攻击者破坏了现代数字生态系统的完整性和韧性。为了应对这种风险,安全专家积极采用基于机器学习或深度学习的策略,整合静态、动态或混合方法来对恶意软件实例进行分类。尽管这些方法有优势,但它们也存在固有的缺点,而恶意软件变种不断演变并日益复杂化,需要进一步推进检测策略。 基于可视化的技术正逐渐成为一种可扩展且易于理解的解决方案,用于检测和理解跨不同平台(包括桌面、移动、物联网和分布式系统)以及通过分析网络数据包捕获文件来检测恶意行为。在这份超过100篇高质量研究文章的综述中,我们评估了应用于恶意软件检测和分类的现有基于可视化的方法。作为首次贡献,我们提出了一个新的全面框架来研究基于可视化的恶意软件检测技术的景观。在这个框架内,我们系统地分析了在恶意软件检测管道的关键阶段中的最新方法。通过不仅分析单一技术,还分析它们如何结合以产生最终解决方案,我们揭示了基于可视化方法中的主要挑战,并提供了对这一关键领域的发展和潜在未来方向的见解。
更新时间: 2025-10-08 15:38:44
领域: cs.CR
AbsoluteNet: A Deep Learning Neural Network to Classify Cerebral Hemodynamic Responses of Auditory Processing
In recent years, deep learning (DL) approaches have demonstrated promising results in decoding hemodynamic responses captured by functional near-infrared spectroscopy (fNIRS), particularly in the context of brain-computer interface (BCI) applications. This work introduces AbsoluteNet, a novel deep learning architecture designed to classify auditory event-related responses recorded using fNIRS. The proposed network is built upon principles of spatio-temporal convolution and customized activation functions. Our model was compared against several models, namely fNIRSNET, MDNN, DeepConvNet, and ShallowConvNet. The results showed that AbsoluteNet outperforms existing models, reaching 87.0% accuracy, 84.8% sensitivity, and 89.2% specificity in binary classification, surpassing fNIRSNET, the second-best model, by 3.8% in accuracy. These findings underscore the effectiveness of our proposed deep learning model in decoding hemodynamic responses related to auditory processing and highlight the importance of spatio-temporal feature aggregation and customized activation functions to better fit fNIRS dynamics.
Updated: 2025-10-08 15:37:01
标题: AbsoluteNet:一种用于分类听觉加工大脑血液动力学反应的深度学习神经网络
摘要: 近年来,深度学习(DL)方法在解码功能性近红外光谱(fNIRS)捕捉到的血液动力学响应方面展现出了令人期待的结果,特别是在脑机接口(BCI)应用的背景下。本文介绍了AbsoluteNet,一种新颖的深度学习架构,旨在对使用fNIRS记录的听觉事件相关响应进行分类。所提出的网络建立在时空卷积和定制激活函数的原则之上。我们的模型与几种模型进行了比较,包括fNIRSNET、MDNN、DeepConvNet和ShallowConvNet。结果显示,AbsoluteNet在二元分类中表现出色,准确率达到87.0%,灵敏度为84.8%,特异度为89.2%,超越了fNIRSNET,第二好的模型,准确率高出3.8%。这些发现强调了我们提出的深度学习模型在解码与听觉处理相关的血液动力学响应方面的有效性,并突显了时空特征聚合和定制激活函数的重要性,以更好地适应fNIRS动态。
更新时间: 2025-10-08 15:37:01
领域: cs.LG,cs.SD,eess.AS
FFT-based Dynamic Subspace Selection for Low-Rank Adaptive Optimization of Large Language Models
Low-rank optimization has emerged as a promising direction in training large language models (LLMs) to improve running time and reduce the memory usage of adaptive optimizers by constraining learning to a lower-dimensional space. Prior work typically projects gradients of linear layers using approaches based on Singular Value Decomposition (SVD) or QR-decomposition. Applying these techniques individually to each layer in large models is computationally expensive and incurs additional memory costs due to storing the projection matrices. In this work, we propose a computationally efficient and conceptually simple, two-step procedure to approximate SVD/QR-based gradient projections into lower-dimensional spaces by using a predefined orthogonal matrix of the Discrete Cosine Transform (DCT). We dynamically select columns from the DCT matrix based on their alignment with the gradient of each layer. The effective projection matrices are obtained via a simple matmul with the DCT matrix in $O(n^3)$ time, followed by a lightweight sorting step to identify the most relevant basis vectors. For large layers, DCT can be computed via Makhoul's $N$-point algorithm based on Fast Fourier Transform (FFT) in $O(n^2 \log(n))$ time. Due to the predefined nature of the orthogonal bases, they are computed once at the start of training. Our numerical experiments on both pre-training and fine-tuning tasks demonstrate the effectiveness of our dual strategy in approximating optimal low-rank projections, obtaining an approach with rank-independent running time that matches the performance of costly SVD/QR-based methods while achieving faster runtime and reduced memory usage by up to $25\%$ across different model sizes. Our code is available at \href{https://github.com/IST-DASLab/ISTA-DASLab-Optimizers}{\texttt{https://github.com/IST-DASLab/ISTA-DASLab-Optimizers}}.
Updated: 2025-10-08 15:33:25
标题: "基于FFT的动态子空间选择用于大型语言模型的低秩自适应优化"
摘要: 低秩优化已经成为训练大型语言模型(LLMs)的一个有前途的方向,可以通过将学习限制在低维空间中来改善运行时间和减少自适应优化器的内存使用。先前的工作通常使用基于奇异值分解(SVD)或QR分解的方法来投影线性层的梯度。将这些技术分别应用于大型模型中的每一层是计算昂贵的,并且由于存储投影矩阵而产生额外的内存成本。在这项工作中,我们提出了一个计算效率高、概念简单的两步过程,通过使用离散余弦变换(DCT)的预定义正交矩阵,将基于SVD/QR的梯度投影近似到低维空间中。我们根据每一层梯度与DCT矩阵的对齐性动态选择DCT矩阵中的列。通过在$O(n^3)$时间内使用DCT矩阵简单地进行矩阵相乘,然后通过轻量级的排序步骤来识别最相关的基础向量,得到有效的投影矩阵。对于大型层,可以通过基于快速傅里叶变换(FFT)的Makhoul的$N$点算法在$O(n^2 \log(n))$时间内计算DCT。由于正交基的预定义性质,它们在训练开始时只需计算一次。我们在预训练和微调任务上的数值实验表明,我们的双重策略在近似最佳低秩投影方面的有效性,获得了一个运行时间与秩无关的方法,可与昂贵的SVD/QR方法的性能相匹配,同时在不同模型大小上实现了更快的运行时间和减少了高达25%的内存使用。我们的代码可在\href{https://github.com/IST-DASLab/ISTA-DASLab-Optimizers}{\texttt{https://github.com/IST-DASLab/ISTA-DASLab-Optimizers}}找到。
更新时间: 2025-10-08 15:33:25
领域: cs.LG,cs.AI
Spectral Graph Clustering under Differential Privacy: Balancing Privacy, Accuracy, and Efficiency
We study the problem of spectral graph clustering under edge differential privacy (DP). Specifically, we develop three mechanisms: (i) graph perturbation via randomized edge flipping combined with adjacency matrix shuffling, which enforces edge privacy while preserving key spectral properties of the graph. Importantly, shuffling considerably amplifies the guarantees: whereas flipping edges with a fixed probability alone provides only a constant epsilon edge DP guarantee as the number of nodes grows, the shuffled mechanism achieves (epsilon, delta) edge DP with parameters that tend to zero as the number of nodes increase; (ii) private graph projection with additive Gaussian noise in a lower-dimensional space to reduce dimensionality and computational complexity; and (iii) a noisy power iteration method that distributes Gaussian noise across iterations to ensure edge DP while maintaining convergence. Our analysis provides rigorous privacy guarantees and a precise characterization of the misclassification error rate. Experiments on synthetic and real-world networks validate our theoretical analysis and illustrate the practical privacy-utility trade-offs.
Updated: 2025-10-08 15:30:27
标题: 在差分隐私下的谱图聚类:平衡隐私、准确性和效率
摘要: 我们研究了在边缘差分隐私(DP)下的谱图聚类问题。具体来说,我们开发了三种机制:(i)通过随机边翻转和邻接矩阵重排来实现图形扰动,从而在保留图形的关键谱特性的同时强制执行边缘隐私。重要的是,重排大大增强了保证:尽管仅通过固定概率翻转边缘在节点数量增加时只能提供恒定的 epsilon 边缘 DP 保证,但重排机制实现了(epsilon,delta)边缘 DP,其参数随节点数量增加而趋于零;(ii)在低维空间中通过添加高斯噪声进行私有图形投影,以减少维数和计算复杂度;以及(iii)一种带有高斯噪声的幂迭代方法,通过迭代分配高斯噪声来确保边缘 DP,同时保持收敛性。我们的分析提供了严格的隐私保证和对误分类错误率的精确描述。对合成和真实网络的实验验证了我们的理论分析,并展示了实际隐私-效用的权衡。
更新时间: 2025-10-08 15:30:27
领域: cs.IT,cs.CR,cs.LG,cs.SI,math.IT
DPMM-CFL: Clustered Federated Learning via Dirichlet Process Mixture Model Nonparametric Clustering
Clustered Federated Learning (CFL) improves performance under non-IID client heterogeneity by clustering clients and training one model per cluster, thereby balancing between a global model and fully personalized models. However, most CFL methods require the number of clusters K to be fixed a priori, which is impractical when the latent structure is unknown. We propose DPMM-CFL, a CFL algorithm that places a Dirichlet Process (DP) prior over the distribution of cluster parameters. This enables nonparametric Bayesian inference to jointly infer both the number of clusters and client assignments, while optimizing per-cluster federated objectives. This results in a method where, at each round, federated updates and cluster inferences are coupled, as presented in this paper. The algorithm is validated on benchmark datasets under Dirichlet and class-split non-IID partitions.
Updated: 2025-10-08 15:27:08
标题: DPMM-CFL:通过狄利克雷过程混合模型非参数聚类实现的聚类式联邦学习
摘要: Clustered Federated Learning (CFL)通过对客户端进行聚类并为每个聚类训练一个模型来改善在非独立同分布(non-IID)客户端异质性下的性能,从而在全局模型和完全个性化模型之间取得平衡。然而,大多数CFL方法要求事先确定集群数K,当潜在结构未知时,这是不切实际的。我们提出了DPMM-CFL,这是一种在集群参数分布上放置狄利克雷过程(DP)先验的CFL算法。这使得非参数贝叶斯推断能够共同推断出集群数和客户端分配,同时优化每个集群的联合学习目标。这导致一种方法,在每一轮中,联邦更新和集群推断是耦合的,就像本文所展示的那样。该算法在狄利克雷和类别分割的非独立同分布分区下的基准数据集上进行了验证。
更新时间: 2025-10-08 15:27:08
领域: cs.LG,cs.DC,stat.ML
Enjoying Non-linearity in Multinomial Logistic Bandits
We consider the multinomial logistic bandit problem, a variant of where a learner interacts with an environment by selecting actions to maximize expected rewards based on probabilistic feedback from multiple possible outcomes. In the binary setting, recent work has focused on understanding the impact of the non-linearity of the logistic model (Faury et al., 2020; Abeille et al., 2021). They introduced a problem-dependent constant $\kappa_* \geq 1$, that may be exponentially large in some problem parameters and which is captured by the derivative of the sigmoid function. It encapsulates the non-linearity and improves existing regret guarantees over $T$ rounds from $\smash{O(d\sqrt{T})}$ to $\smash{O(d\sqrt{T/\kappa_*})}$, where $d$ is the dimension of the parameter space. We extend their analysis to the multinomial logistic bandit framework, making it suitable for complex applications with more than two choices, such as reinforcement learning or recommender systems. To achieve this, we extend the definition of $\kappa_*$ to the multinomial setting and propose an efficient algorithm that leverages the problem's non-linearity. Our method yields a problem-dependent regret bound of order $ \smash{\widetilde{\mathcal{O}}( R d \sqrt{{KT}/{\kappa_*}})} $, where $R$ is the norm of the vector of rewards and $K$ is the number of outcomes. This improves upon the best existing guarantees of order $ \smash{\widetilde{\mathcal{O}}( RdK \sqrt{T} )} $. Moreover, we provide a $\smash{ \Omega(Rd\sqrt{KT/\kappa_*})}$ lower-bound, showing that our algorithm is minimax-optimal and that our definition of $\kappa_*$ is optimal.
Updated: 2025-10-08 15:15:45
标题: 享受多项逻辑回归赌博中的非线性
摘要: 我们考虑多项式逻辑赌博问题,这是一个变体,其中学习者通过选择行动与环境进行交互,以最大化基于多种可能结果的概率反馈的预期奖励。在二元设置中,最近的研究集中于理解逻辑模型的非线性(Faury等,2020年;Abeille等,2021年)的影响。他们引入了一个问题相关的常数$\kappa_* \geq 1$,在某些问题参数上可能呈指数增长,并且由Sigmoid函数的导数捕捉到。它包含了非线性,并将现有的回报保证从$O(d\sqrt{T})$改进到$O(d\sqrt{T/\kappa_*})$,其中$d$是参数空间的维度。我们将他们的分析扩展到多项式逻辑赌博框架,使其适用于具有超过两个选择的复杂应用,例如强化学习或推荐系统。为了实现这一点,我们将$\kappa_*$的定义扩展到多项式设置,并提出一种利用问题非线性的高效算法。我们的方法产生一个关于问题的依赖性遗憾界的阶$ \widetilde{\mathcal{O}}( R d \sqrt{{KT}/{\kappa_*}})$,其中$R$是奖励向量的范数,$K$是结果的数量。这超过了现有最佳保证的阶$ \widetilde{\mathcal{O}}( RdK \sqrt{T} )$。此外,我们提供了一个$\Omega(Rd\sqrt{KT/\kappa_*})$的下限,表明我们的算法是极小极优的,并且我们对$\kappa_*$的定义是最优的。
更新时间: 2025-10-08 15:15:45
领域: stat.ML,cs.AI,cs.LG,math.ST,stat.TH
MetaSlot: Break Through the Fixed Number of Slots in Object-Centric Learning
Learning object-level, structured representations is widely regarded as a key to better generalization in vision and underpins the design of next-generation Pre-trained Vision Models (PVMs). Mainstream Object-Centric Learning (OCL) methods adopt Slot Attention or its variants to iteratively aggregate objects' super-pixels into a fixed set of query feature vectors, termed slots. However, their reliance on a static slot count leads to an object being represented as multiple parts when the number of objects varies. We introduce MetaSlot, a plug-and-play Slot Attention variant that adapts to variable object counts. MetaSlot (i) maintains a codebook that holds prototypes of objects in a dataset by vector-quantizing the resulting slot representations; (ii) removes duplicate slots from the traditionally aggregated slots by quantizing them with the codebook; and (iii) injects progressively weaker noise into the Slot Attention iterations to accelerate and stabilize the aggregation. MetaSlot is a general Slot Attention variant that can be seamlessly integrated into existing OCL architectures. Across multiple public datasets and tasks--including object discovery and recognition--models equipped with MetaSlot achieve significant performance gains and markedly interpretable slot representations, compared with existing Slot Attention variants.
Updated: 2025-10-08 15:14:03
标题: MetaSlot:突破物体中心学习中固定槽位数量
摘要: 学习对象级结构化表示被广泛认为是视觉中更好泛化的关键,也支撑着下一代预训练视觉模型(PVMs)的设计。主流的对象中心学习(OCL)方法采用槽注意力或其变体,逐步将对象的超像素聚合成一组固定的查询特征向量,称为槽。然而,它们对静态槽计数的依赖导致当对象数量变化时,一个对象被表示为多个部分。我们引入了MetaSlot,一种可以适应变量对象计数的即插即用的槽注意力变体。MetaSlot(i)通过对结果槽表示进行向量量化来维护一个包含数据集中对象原型的码书;(ii)通过与码书量化来消除传统聚合槽中的重复槽;(iii)在槽注意力迭代中逐渐注入较弱的噪声,以加速和稳定聚合过程。MetaSlot是一种通用的槽注意力变体,可以无缝集成到现有的OCL架构中。在多个公共数据集和任务上,包括对象发现和识别,装备MetaSlot的模型相比现有的槽注意力变体取得了显著的性能提升和明显的可解释的槽表示。
更新时间: 2025-10-08 15:14:03
领域: cs.CV,cs.LG
TRIM: Token-wise Attention-Derived Saliency for Data-Efficient Instruction Tuning
Instruction tuning is essential for aligning large language models (LLMs) to downstream tasks and commonly relies on large, diverse corpora. However, small, high-quality subsets, known as coresets, can deliver comparable or superior results, though curating them remains challenging. Existing methods often rely on coarse, sample-level signals like gradients, an approach that is computationally expensive and overlooks fine-grained features. To address this, we introduce TRIM (Token Relevance via Interpretable Multi-layer Attention), a forward-only, token-centric framework. Instead of using gradients, TRIM operates by matching underlying representational patterns identified via attention-based "fingerprints" from a handful of target samples. Such an approach makes TRIM highly efficient and uniquely sensitive to the structural features that define a task. Coresets selected by our method consistently outperform state-of-the-art baselines by up to 9% on downstream tasks and even surpass the performance of full-data fine-tuning in some settings. By avoiding expensive backward passes, TRIM achieves this at a fraction of the computational cost. These findings establish TRIM as a scalable and efficient alternative for building high-quality instruction-tuning datasets.
Updated: 2025-10-08 15:11:04
标题: TRIM:基于标记注意力推导的数据高效指令调整显著性
摘要: 指导调优对于将大型语言模型(LLMs)与下游任务对齐至关重要,通常依赖于大型、多样化的语料库。然而,小型、高质量的子集,即coresets,可以提供可比较或更优秀的结果,尽管筛选它们仍具有挑战性。现有方法通常依赖于粗糙的、样本级别的信号,如梯度,这种方法在计算上昂贵且忽略了细粒度特征。为了解决这个问题,我们引入了TRIM(通过可解释的多层注意力实现Token相关性),这是一个仅向前的、以token为中心的框架。TRIM不使用梯度,而是通过匹配通过少数目标样本识别的基础表征模式来进行操作,这种方法使TRIM高效且对定义任务的结构特征具有独特的敏感性。我们的方法选择的coresets在下游任务上始终比最先进的基线表现优越高达9%,甚至在某些情况下甚至超过了全数据微调的性能。通过避免昂贵的反向传递,TRIM以较小的计算成本实现了这一点。这些发现将TRIM确立为构建高质量指导调优数据集的可扩展和高效的替代方案。
更新时间: 2025-10-08 15:11:04
领域: cs.CL,cs.LG
The Contingencies of Physical Embodiment Allow for Open-Endedness and Care
Physical vulnerability and mortality are often seen as obstacles to be avoided in the development of artificial agents, which struggle to adapt to open-ended environments and provide aligned care. Meanwhile, biological organisms survive, thrive, and care for each other in an open-ended physical world with relative ease and efficiency. Understanding the role of the conditions of life in this disparity can aid in developing more robust, adaptive, and caring artificial agents. Here we define two minimal conditions for physical embodiment inspired by the existentialist phenomenology of Martin Heidegger: being-in-the-world (the agent is a part of the environment) and being-towards-death (unless counteracted, the agent drifts toward terminal states due to the second law of thermodynamics). We propose that from these conditions we can obtain both a homeostatic drive - aimed at maintaining integrity and avoiding death by expending energy to learn and act - and an intrinsic drive to continue to do so in as many ways as possible. Drawing inspiration from Friedrich Nietzsche's existentialist concept of will-to-power, we examine how intrinsic drives to maximize control over future states, e.g., empowerment, allow agents to increase the probability that they will be able to meet their future homeostatic needs, thereby enhancing their capacity to maintain physical integrity. We formalize these concepts within a reinforcement learning framework, which enables us to examine how intrinsically driven embodied agents learning in open-ended multi-agent environments may cultivate the capacities for open-endedness and care.ov
Updated: 2025-10-08 15:10:26
标题: 身体化的偶然性为开放性和关怀提供可能性
摘要: 身体脆弱性和死亡通常被视为发展人工智能代理的障碍,这些代理很难适应开放式环境并提供对齐的关怀。与此同时,生物有机体在一个开放式的物理世界中生存、茁壮成长并相互关怀,相对轻松高效。了解生活条件在这种差异中的作用可以帮助开发更健壮、适应性更强、更关爱的人工智能代理。在这里,我们通过马丁·海德格尔的存在主义现象学,定义了两个最小的物理具象条件:存在于世界中(代理是环境的一部分)和存在于死亡中(除非采取对抗措施,代理会由于热力学第二定律而向终端状态漂移)。我们提出,从这些条件中我们可以获得两种驱动力 - 一个旨在通过消耗能量学习和行动来维持完整性和避免死亡的稳态驱动力,以及一种内在驱动力,继续以尽可能多的方式这样做。受弗里德里希·尼采的存在主义意志到权力概念的启发,我们审视了内在驱动力最大化对未来状态的控制的方式,例如赋权,使代理能够增加他们能够满足未来稳态需求的概率,从而增强他们维护身体完整性的能力。我们在一个强化学习框架内形式化了这些概念,这使我们能够研究如何在开放式多代理环境中学习的内在驱动的具象代理可以培养开放性和关怀的能力。
更新时间: 2025-10-08 15:10:26
领域: cs.AI,cs.LG
GNN-enhanced Traffic Anomaly Detection for Next-Generation SDN-Enabled Consumer Electronics
Consumer electronics (CE) connected to the Internet of Things are susceptible to various attacks, including DDoS and web-based threats, which can compromise their functionality and facilitate remote hijacking. These vulnerabilities allow attackers to exploit CE for broader system attacks while enabling the propagation of malicious code across the CE network, resulting in device failures. Existing deep learning-based traffic anomaly detection systems exhibit high accuracy in traditional network environments but are often overly complex and reliant on static infrastructure, necessitating manual configuration and management. To address these limitations, we propose a scalable network model that integrates Software-defined Networking (SDN) and Compute First Networking (CFN) for next-generation CE networks. In this network model, we propose a Graph Neural Networks-based Network Anomaly Detection framework (GNN-NAD) that integrates SDN-based CE networks and enables the CFN architecture. GNN-NAD uniquely fuses a static, vulnerability-aware attack graph with dynamic traffic features, providing a holistic view of network security. The core of the framework is a GNN model (GSAGE) for graph representation learning, followed by a Random Forest (RF) classifier. This design (GSAGE+RF) demonstrates superior performance compared to existing feature selection methods. Experimental evaluations on CE environment reveal that GNN-NAD achieves superior metrics in accuracy, recall, precision, and F1 score, even with small sample sizes, exceeding the performance of current network anomaly detection methods. This work advances the security and efficiency of next-generation intelligent CE networks.
Updated: 2025-10-08 15:01:40
标题: GNN增强的下一代SDN支持的消费类电子产品交通异常检测
摘要: 消费电子产品(CE)连接到物联网是容易受到各种攻击的,包括DDoS和基于web的威胁,这可能会危及它们的功能,并促使远程劫持。这些漏洞使攻击者能够利用CE进行更广泛的系统攻击,同时在CE网络中传播恶意代码,导致设备故障。现有基于深度学习的流量异常检测系统在传统网络环境中表现出较高的准确性,但往往过于复杂,并且依赖于静态基础设施,需要手动配置和管理。为了解决这些限制,我们提出了一个集成了软件定义网络(SDN)和优先计算网络(CFN)的可扩展网络模型,用于下一代CE网络。在这个网络模型中,我们提出了一个基于图神经网络的网络异常检测框架(GNN-NAD),它集成了基于SDN的CE网络,并启用了CFN架构。GNN-NAD独特地将静态、脆弱性感知的攻击图与动态流量特征融合在一起,提供了对网络安全的全面视图。框架的核心是一个用于图表示学习的GNN模型(GSAGE),后跟一个随机森林(RF)分类器。这种设计(GSAGE+RF)表现出比现有特征选择方法更优越的性能。在CE环境上进行的实验评估显示,GNN-NAD在准确性、召回率、精确度和F1分数方面实现了优越的指标,即使在样本量较小的情况下,也超过了当前网络异常检测方法的性能。这项工作推进了下一代智能CE网络的安全性和效率。
更新时间: 2025-10-08 15:01:40
领域: cs.CR,cs.LG,cs.NI,C.2.0; C.2.1; C.2.3; C.2.5; I.2.6; K.6.5
Active Control of Turbulent Airfoil Flows Using Adjoint-based Deep Learning
We train active neural-network flow controllers using a deep learning PDE augmentation method to optimize lift-to-drag ratios in turbulent airfoil flows at Reynolds number $5\times10^4$ and Mach number 0.4. Direct numerical simulation and large eddy simulation are employed to model compressible, unconfined flow over two- and three-dimensional semi-infinite NACA 0012 airfoils at angles of attack $\alpha = 5^\circ$, $10^\circ$, and $15^\circ$. Control actions, implemented through a blowing/suction jet at a fixed location and geometry on the upper surface, are adaptively determined by a neural network that maps local pressure measurements to optimal jet total pressure, enabling a sensor-informed control policy that responds spatially and temporally to unsteady flow conditions. The sensitivities of the flow to the neural network parameters are computed using the adjoint Navier-Stokes equations, which we construct using automatic differentiation applied to the flow solver. The trained flow controllers significantly improve the lift-to-drag ratios and reduce flow separation for both two- and three-dimensional airfoil flows, especially at $\alpha = 5^\circ$ and $10^\circ$. The 2D-trained models remain effective when applied out-of-sample to 3D flows, which demonstrates the robustness of the adjoint-trained control approach. The 3D-trained models capture the flow dynamics even more effectively, which leads to better energy efficiency and comparable performance for both adaptive (neural network) and offline (simplified, constant-pressure) controllers. These results underscore the effectiveness of this learning-based approach in improving aerodynamic performance.
Updated: 2025-10-08 14:59:29
标题: 使用基于伴随法的深度学习对湍流翼型流动进行主动控制
摘要: 我们使用深度学习PDE增强方法训练主动神经网络流控制器,以优化雷诺数为$5\times10^4$和马赫数为0.4的湍流翼型流动中的升阻比。直接数值模拟和大涡模拟被用来模拟在攻角$\alpha = 5^\circ$、$10^\circ$和$15^\circ$时通过二维和三维半无限NACA 0012翼型的可压缩、非限制流动。通过在上表面的固定位置和几何形状上实施吹/吸喷口的控制行为,通过神经网络将局部压力测量映射到最佳喷口总压力来自适应确定控制策略,从而实现对空间和时间上的不稳定流动条件做出响应的传感器驱动的控制策略。利用伴随Navier-Stokes方程计算流动对神经网络参数的敏感性,我们使用自动微分应用于流动求解器构建这些方程。经过训练的流控制器显著改善了二维和三维翼型流动的升阻比,并在$\alpha = 5^\circ$和$10^\circ$时减少了流动分离。当将2D训练模型应用于3D流动时,这些模型仍然有效,这表明了伴随训练控制方法的稳健性。3D训练模型更有效地捕获了流动动态,从而实现了更好的能量效率,并且在自适应(神经网络)和离线(简化的恒定压力)控制器之间表现出可比性。这些结果强调了这种基于学习的方法在改善空气动力性能方面的有效性。
更新时间: 2025-10-08 14:59:29
领域: physics.flu-dyn,cs.LG
Diffusion-Augmented Reinforcement Learning for Robust Portfolio Optimization under Stress Scenarios
In the ever-changing and intricate landscape of financial markets, portfolio optimisation remains a formidable challenge for investors and asset managers. Conventional methods often struggle to capture the complex dynamics of market behaviour and align with diverse investor preferences. To address this, we propose an innovative framework, termed Diffusion-Augmented Reinforcement Learning (DARL), which synergistically integrates Denoising Diffusion Probabilistic Models (DDPMs) with Deep Reinforcement Learning (DRL) for portfolio management. By leveraging DDPMs to generate synthetic market crash scenarios conditioned on varying stress intensities, our approach significantly enhances the robustness of training data. Empirical evaluations demonstrate that DARL outperforms traditional baselines, delivering superior risk-adjusted returns and resilience against unforeseen crises, such as the 2025 Tariff Crisis. This work offers a robust and practical methodology to bolster stress resilience in DRL-driven financial applications.
Updated: 2025-10-08 14:56:50
标题: 扩散增强强化学习用于应对压力情景下的鲁棒投资组合优化
摘要: 在金融市场不断变化和错综复杂的情况下,投资者和资产管理人员仍然面临着巨大的挑战,即投资组合优化。传统方法往往难以捕捉市场行为的复杂动态并与不同投资者偏好相一致。为了解决这个问题,我们提出了一个创新框架,称为扩散增强强化学习(DARL),该框架将去噪扩散概率模型(DDPMs)与深度强化学习(DRL)相结合,用于投资组合管理。通过利用DDPMs生成根据不同压力强度条件的合成市场崩盘情景,我们的方法显著增强了训练数据的稳健性。实证评估表明,DARL优于传统基线,提供了更优异的风险调整回报率,并且对于像2025年关税危机这样的意外危机具有更强的韧性。这项工作提供了一种稳健且实用的方法论,用于增强DRL驱动的金融应用程序的抗压力能力。
更新时间: 2025-10-08 14:56:50
领域: stat.ML,cs.CE,cs.LG,q-fin.CP
Non-Asymptotic Analysis of Efficiency in Conformalized Regression
Conformal prediction provides prediction sets with coverage guarantees. The informativeness of conformal prediction depends on its efficiency, typically quantified by the expected size of the prediction set. Prior work on the efficiency of conformalized regression commonly treats the miscoverage level $\alpha$ as a fixed constant. In this work, we establish non-asymptotic bounds on the deviation of the prediction set length from the oracle interval length for conformalized quantile and median regression trained via SGD, under mild assumptions on the data distribution. Our bounds of order $\mathcal{O}(1/\sqrt{n} + 1/(\alpha^2 n) + 1/\sqrt{m} + \exp(-\alpha^2 m))$ capture the joint dependence of efficiency on the proper training set size $n$, the calibration set size $m$, and the miscoverage level $\alpha$. The results identify phase transitions in convergence rates across different regimes of $\alpha$, offering guidance for allocating data to control excess prediction set length. Empirical results are consistent with our theoretical findings.
Updated: 2025-10-08 14:50:35
标题: 非渐近分析下的拟合回归效率
摘要: 共形预测提供具有覆盖保证的预测集。共形预测的信息性取决于其效率,通常通过预测集的期望大小来量化。关于共形化回归效率的先前研究通常将误覆盖水平$\alpha$视为固定常数。在这项工作中,我们在数据分布上假设温和的情况下,建立了共形化分位数和中位数回归的预测集长度与正态区间长度的偏差的非渐近边界,这些模型是通过SGD训练的。我们的界限的顺序为$\mathcal{O}(1/\sqrt{n} + 1/(\alpha^2 n) + 1/\sqrt{m} + \exp(-\alpha^2 m))$,捕捉了效率对适当训练集大小$n$、校准集大小$m$和误覆盖水平$\alpha$的联合依赖关系。结果确定了在不同$\alpha$区域的收敛速率中的相变,为分配数据以控制多余预测集长度提供了指导。实证结果与我们的理论发现一致。
更新时间: 2025-10-08 14:50:35
领域: cs.LG,stat.ML
TokenWeave: Efficient Compute-Communication Overlap for Distributed LLM Inference
Distributed inference of large language models (LLMs) can introduce overheads of up to 20% even over GPUs connected via high-speed interconnects such as NVLink. Multiple techniques have been proposed to mitigate these overheads by decomposing computations into finer-grained tasks and overlapping communication with sub-tasks as they complete. However, fine-grained decomposition of a large computation into many smaller computations on GPUs results in overheads. Furthermore, the communication itself uses many streaming multiprocessors (SMs), adding to the overhead. We present TokenWeave to address these challenges. TokenWeave proposes a Token-Splitting technique that divides the tokens in the inference batch into two approximately equal subsets in a wave-aware manner. The communication of one subset is then overlapped with the computation of the other. In addition, TokenWeave optimizes the order of the layer normalization computation with respect to communication operations and implements a novel fused AllReduce--RMSNorm kernel that carefully leverages Multimem instruction support available on Hopper and Blackwell NVIDIA GPUs. These optimizations allow TokenWeave to perform communication and RMSNorm using only 2-8 SMs. Moreover, our kernel enables the memory-bound RMSNorm to be overlapped with the other batch's computation, providing additional gains. Our evaluations demonstrate up to 1.29x speedup in latency and 1.26x higher throughput across multiple models and workloads. In several settings, TokenWeave results in better performance compared to an equivalent model with all communication removed.
Updated: 2025-10-08 14:49:25
标题: TokenWeave:分布式LLM推断的高效计算通信重叠
摘要: 大规模语言模型(LLMs)的分布式推断甚至可以在通过高速互连(如NVLink)连接的GPU上引入高达20%的开销。已经提出了多种技术来通过将计算分解为更细粒度的任务并在子任务完成时与通信重叠来减轻这些开销。然而,将大型计算细粒度分解为许多小型计算在GPU上会导致额外开销。此外,通信本身使用许多流多处理器(SMs),增加了开销。 我们提出TokenWeave来解决这些挑战。TokenWeave提出了一种Token-Splitting技术,以波形感知的方式将推断批次中的令牌分为两个大致相等的子集。然后,一个子集的通信与另一个子集的计算重叠。此外,TokenWeave优化了与通信操作相关的层归一化计算顺序,并实现了一种新颖的融合AllReduce-RMSNorm内核,精心利用了Hopper和Blackwell NVIDIA GPU上可用的Multimem指令支持。这些优化使TokenWeave能够仅使用2-8个SM来执行通信和RMSNorm。此外,我们的内核使内存绑定的RMSNorm能够与其他批次的计算重叠,提供额外的收益。 我们的评估显示,在多个模型和工作负载中,延迟速度提高了最多1.29倍,吞吐量提高了1.26倍。在几种设置中,与移除所有通信的等效模型相比,TokenWeave表现出更好的性能。
更新时间: 2025-10-08 14:49:25
领域: cs.DC,cs.LG
Generative World Modelling for Humanoids: 1X World Model Challenge Technical Report
World models are a powerful paradigm in AI and robotics, enabling agents to reason about the future by predicting visual observations or compact latent states. The 1X World Model Challenge introduces an open-source benchmark of real-world humanoid interaction, with two complementary tracks: sampling, focused on forecasting future image frames, and compression, focused on predicting future discrete latent codes. For the sampling track, we adapt the video generation foundation model Wan-2.2 TI2V-5B to video-state-conditioned future frame prediction. We condition the video generation on robot states using AdaLN-Zero, and further post-train the model using LoRA. For the compression track, we train a Spatio-Temporal Transformer model from scratch. Our models achieve 23.0 dB PSNR in the sampling task and a Top-500 CE of 6.6386 in the compression task, securing 1st place in both challenges.
Updated: 2025-10-08 14:49:12
标题: 生成式人形机器人世界建模:1X世界模型挑战技术报告
摘要: 世界模型是人工智能和机器人技术中的一个强大范式,使代理能够通过预测视觉观察或紧凑潜在状态来推理未来。1X世界模型挑战引入了一个开源的现实世界人形互动基准,包括两个互补的轨道:采样,专注于预测未来图像帧,和压缩,专注于预测未来离散潜在编码。对于采样轨道,我们将视频生成基础模型Wan-2.2 TI2V-5B调整为视频状态条件下的未来帧预测。我们使用AdaLN-Zero将视频生成与机器人状态联系起来,并进一步使用LoRA对模型进行后训练。对于压缩轨道,我们从头开始训练了一个时空变换器模型。我们的模型在采样任务中实现了23.0 dB的PSNR,在压缩任务中实现了6.6386的Top-500 CE,获得了两个挑战中的第一名。
更新时间: 2025-10-08 14:49:12
领域: cs.LG,cs.AI,cs.RO
Explaining Models under Multivariate Bernoulli Distribution via Hoeffding Decomposition
Explaining the behavior of predictive models with random inputs can be achieved through sub-models decomposition, where such sub-models have easier interpretable features. Arising from the uncertainty quantification community, recent results have demonstrated the existence and uniqueness of a generalized Hoeffding decomposition for such predictive models when the stochastic input variables are correlated, based on concepts of oblique projection onto L 2 subspaces. This article focuses on the case where the input variables have Bernoulli distributions and provides a complete description of this decomposition. We show that in this case the underlying L 2 subspaces are one-dimensional and that the functional decomposition is explicit. This leads to a complete interpretability framework and theoretically allows reverse engineering. Explicit indicators of the influence of inputs on the output prediction (exemplified by Sobol' indices and Shapley effects) can be explicitly derived. Illustrated by numerical experiments, this type of analysis proves useful for addressing decision-support problems, based on binary decision diagrams, Boolean networks or binary neural networks. The article outlines perspectives for exploring high-dimensional settings and, beyond the case of binary inputs, extending these findings to models with finite countable inputs.
Updated: 2025-10-08 14:46:20
标题: 通过Hoeffding分解解释多元伯努利分布模型
摘要: 通过对随机输入的预测模型行为进行解释可以通过子模型分解来实现,这些子模型具有更容易解释的特征。最近的结果来自不确定性量化社区,已经证明了当随机输入变量相关时,基于L2子空间的斜投影概念存在并且是唯一的广义Hoeffding分解。本文关注输入变量具有伯努利分布的情况,并提供了对这种分解的完整描述。我们证明在这种情况下,基础的L2子空间是一维的,并且功能分解是明确的。这导致了一个完整的可解释性框架,理论上允许逆向工程。输入对输出预测的影响的明确指标(例如Sobol'指数和Shapley效应)可以明确推导出来。通过数值实验的例证,这种分析类型证明在处理基于二进制决策图、布尔网络或二进制神经网络的决策支持问题方面是有用的。本文概述了探索高维设置的展望,并且除了二进制输入的情况外,将这些发现扩展到具有有限可数输入的模型。
更新时间: 2025-10-08 14:46:20
领域: stat.ML,cs.LG
Non-Stationary Online Structured Prediction with Surrogate Losses
Online structured prediction, including online classification as a special case, is the task of sequentially predicting labels from input features. Therein the surrogate regret -- the cumulative excess of the target loss (e.g., 0-1 loss) over the surrogate loss (e.g., logistic loss) of the fixed best estimator -- has gained attention, particularly because it often admits a finite bound independent of the time horizon $T$. However, such guarantees break down in non-stationary environments, where every fixed estimator may incur the surrogate loss growing linearly with $T$. We address this by proving a bound of the form $F_T + C(1 + P_T)$ on the cumulative target loss, where $F_T$ is the cumulative surrogate loss of any comparator sequence, $P_T$ is its path length, and $C > 0$ is some constant. This bound depends on $T$ only through $F_T$ and $P_T$, often yielding much stronger guarantees in non-stationary environments. Our core idea is to synthesize the dynamic regret bound of the online gradient descent (OGD) with the technique of exploiting the surrogate gap. Our analysis also sheds light on a new Polyak-style learning rate for OGD, which systematically offers target-loss guarantees and exhibits promising empirical performance. We further extend our approach to a broader class of problems via the convolutional Fenchel--Young loss. Finally, we prove a lower bound showing that the dependence on $F_T$ and $P_T$ is tight.
Updated: 2025-10-08 14:43:44
标题: 非平稳在线结构预测与替代损失
摘要: 在线结构化预测,包括在线分类作为一个特例,是从输入特征顺序预测标签的任务。其中,替代遗憾——目标损失(例如0-1损失)超过固定最佳估计器的替代损失(例如逻辑损失)的累积超额——引起了关注,特别是因为它通常允许一个独立于时间范围T的有限界。然而,在非稳态环境中,这种保证会瓦解,其中每个固定估计器可能会导致替代损失随T线性增长。我们通过证明形式为F_T + C(1 + P_T)的边界来解决这个问题,其中F_T是任何比较序列的替代损失累积和,P_T是其路径长度,C > 0是一些常数。这个边界只通过F_T和P_T依赖于T,通常在非稳态环境中提供更强的保证。我们的核心思想是合成在线梯度下降(OGD)的动态遗憾边界,利用替代间隙的技术。我们的分析还揭示了OGD的一种新的Polyak风格学习率,系统地提供目标损失保证,并展示了有希望的实证表现。我们进一步通过卷积Fenchel-Young损失将我们的方法扩展到更广泛的问题类。最后,我们证明了一个下界,显示对F_T和P_T的依赖是紧密的。
更新时间: 2025-10-08 14:43:44
领域: cs.LG
Want to train KANS at scale? Now UKAN!
Kolmogorov-Arnold Networks (KANs) have recently emerged as a powerful alternative to traditional multilayer perceptrons. However, their reliance on predefined, bounded grids restricts their ability to approximate functions on unbounded domains. To address this, we present Unbounded Kolmogorov-Arnold Networks (UKANs), a method that removes the need for bounded grids in traditional Kolmogorov-Arnold Networks (KANs). The key innovation of this method is a coefficient-generator (CG) model that produces, on the fly, only the B-spline coefficients required locally on an unbounded symmetric grid. UKANs couple multilayer perceptrons with KANs by feeding the positional encoding of grid groups into the CG model, enabling function approximation on unbounded domains without requiring data normalization. To reduce the computational cost of both UKANs and KANs, we introduce a GPU-accelerated library that lowers B-spline evaluation complexity by a factor proportional to the grid size, enabling large-scale learning by leveraging efficient memory management, in line with recent software advances such as FlashAttention and FlashFFTConv. Performance benchmarking confirms the superior memory and computational efficiency of our accelerated KAN (warpKAN), and UKANs, showing a 3-30x speed-up and up to 1000x memory reduction compared to vanilla KANs. Experiments on regression, classification, and generative tasks demonstrate the effectiveness of UKANs to match or surpass KAN accuracy. Finally, we use both accelerated KAN and UKAN in a molecular property prediction task, establishing the feasibility of large-scale end-to-end training with our optimized implementation.
Updated: 2025-10-08 14:41:42
标题: 想要在规模上进行KANS培训吗?现在使用UKAN!
摘要: 科尔莫哥洛夫-阿诺德网络(KANs)最近已经成为传统多层感知器的强大替代方案。然而,它们依赖于预定义的有界网格限制了它们在无界域上逼近函数的能力。为了解决这个问题,我们提出了无界科尔莫哥洛夫-阿诺德网络(UKANs),这种方法消除了传统科尔莫哥洛夫-阿诺德网络(KANs)中有界网格的需求。该方法的关键创新是一个系数生成器(CG)模型,它在对称无界网格上仅生成所需的B样条系数。UKANs通过将网格组的位置编码输入CG模型,将多层感知器与KANs耦合,从而在无界域上进行函数逼近而无需数据标准化。为了降低UKANs和KANs的计算成本,我们引入了一个GPU加速库,通过降低与网格大小成比例的B样条评估复杂度,实现了大规模学习,利用高效的内存管理,符合最近的软件进展如FlashAttention和FlashFFTConv。性能基准测试证实了我们加速的KAN(warpKAN)和UKANs的优越内存和计算效率,相比于普通的KANs,速度提高了3-30倍,内存减少了最多1000倍。在回归、分类和生成任务的实验中,证明了UKANs与KAN的准确性相匹敌或超越。最后,我们在分子属性预测任务中使用了加速的KAN和UKAN,证实了通过我们优化的实现进行大规模端到端训练的可行性。
更新时间: 2025-10-08 14:41:42
领域: cs.LG
HTMformer: Hybrid Time and Multivariate Transformer for Time Series Forecasting
Transformer-based methods have achieved impressive results in time series forecasting. However, existing Transformers still exhibit limitations in sequence modeling as they tend to overemphasize temporal dependencies. This incurs additional computational overhead without yielding corresponding performance gains. We find that the performance of Transformers is highly dependent on the embedding method used to learn effective representations. To address this issue, we extract multivariate features to augment the effective information captured in the embedding layer, yielding multidimensional embeddings that convey richer and more meaningful sequence representations. These representations enable Transformer-based forecasters to better understand the series. Specifically, we introduce Hybrid Temporal and Multivariate Embeddings (HTME). The HTME extractor integrates a lightweight temporal feature extraction module with a carefully designed multivariate feature extraction module to provide complementary features, thereby achieving a balance between model complexity and performance. By combining HTME with the Transformer architecture, we present HTMformer, leveraging the enhanced feature extraction capability of the HTME extractor to build a lightweight forecaster. Experiments conducted on eight real-world datasets demonstrate that our approach outperforms existing baselines in both accuracy and efficiency.
Updated: 2025-10-08 14:40:42
标题: HTMformer:用于时间序列预测的混合时间和多元变换器
摘要: 基于Transformer的方法在时间序列预测中取得了令人印象深刻的结果。然而,现有的Transformer在序列建模方面仍存在局限性,因为它们往往过分强调时间依赖关系。这会导致额外的计算开销,而没有产生相应的性能提升。我们发现,Transformer的性能高度依赖于用于学习有效表示的嵌入方法。为了解决这个问题,我们提取多变量特征,以增强嵌入层捕获的有效信息,从而产生传达更丰富和更有意义的序列表示的多维嵌入。这些表示使基于Transformer的预测器能够更好地理解该系列。具体而言,我们引入了混合时间和多变量嵌入(HTME)。HTME提取器集成了一个轻量级的时间特征提取模块和一个精心设计的多变量特征提取模块,以提供互补特征,从而在模型复杂性和性能之间实现平衡。通过将HTME与Transformer架构相结合,我们提出了HTMformer,利用HTME提取器的增强特征提取能力来构建一个轻量级的预测器。在八个真实世界数据集上进行的实验证明,我们的方法在准确性和效率方面优于现有基线。
更新时间: 2025-10-08 14:40:42
领域: cs.LG,cs.AI
BIRD-INTERACT: Re-imagining Text-to-SQL Evaluation for Large Language Models via Lens of Dynamic Interactions
Large language models (LLMs) have demonstrated remarkable performance on single-turn text-to-SQL tasks, but real-world database applications predominantly require multi-turn interactions to handle ambiguous queries, execution errors, and evolving user requirements. Existing multi-turn benchmarks fall short by treating conversation histories as static context or limiting evaluation to read-only operations, failing to reflect production-grade database assistant challenges. We introduce BIRD-INTERACT, a benchmark that restores this realism through: (1) a comprehensive interaction environment coupling each database with a hierarchical knowledge base, metadata files, and a function-driven user simulator, enabling models to solicit clarifications, retrieve knowledge, and recover from errors without human supervision; (2) two evaluation settings consisting of a pre-defined conversational protocol (c-Interact) and an open-ended agentic setting (a-Interact) where models autonomously decide when to query the user simulator or explore the environment; (3) a challenging task suite covering the full CRUD spectrum for business-intelligence and operational use cases, guarded by executable test cases. Each task features ambiguous and follow-up sub-tasks requiring dynamic interaction. The suite comprises BIRD-INTERACT-FULL (600 tasks, up to 11,796 interactions) for comprehensive performance assessment, and BIRD-INTERACT-LITE (300 tasks with simplified databases) for detailed behavioral analysis and rapid method development. Our empirical results highlight BIRD-INTERACT's difficulty: GPT-5 completes only 8.67% of tasks in c-Interact and 17.00% in a-Interact. Analysis via memory grafting and Interaction Test-time Scaling validates the importance of effective interaction for complex, dynamic text-to-SQL tasks.
Updated: 2025-10-08 14:39:59
标题: BIRD-INTERACT: 通过动态交互的视角重新构想大语言模型的文本到SQL评估
摘要: 大型语言模型(LLMs)在单轮文本到SQL任务中表现出色,但实际数据库应用主要需要多轮交互来处理模糊查询、执行错误和不断变化的用户需求。现有的多轮基准测试存在不足之处,将对话历史视为静态上下文或将评估限制在只读操作,未能反映生产级数据库助手的挑战。我们引入了BIRD-INTERACT,一个通过以下方式恢复现实感的基准测试:(1)综合交互环境,将每个数据库与分层知识库、元数据文件和基于功能的用户模拟器耦合,使模型能够在无人监督的情况下征询澄清、检索知识并从错误中恢复;(2)两种评估设置,包括预定义的会话协议(c-Interact)和一个开放式主体设置(a-Interact),其中模型自主决定何时查询用户模拟器或探索环境;(3)一个具有挑战性的任务套件,涵盖业务智能和运营用例的完整CRUD光谱,由可执行的测试用例保护。每个任务都包含模糊和后续子任务,需要动态交互。该套件包括BIRD-INTERACT-FULL(600个任务,最多11,796次交互)用于全面性能评估,以及BIRD-INTERACT-LITE(300个任务,带简化数据库)用于详细的行为分析和快速方法开发。我们的实证结果突显了BIRD-INTERACT的难度:GPT-5仅在c-Interact中完成了8.67%的任务,在a-Interact中完成了17.00%。通过记忆嫁接和交互测试时间缩放的分析验证了对复杂、动态文本到SQL任务的有效交互的重要性。
更新时间: 2025-10-08 14:39:59
领域: cs.AI
Pseudo-MDPs: A Novel Framework for Efficiently Optimizing Last Revealer Seed Manipulations in Blockchains
This study tackles the computational challenges of solving Markov Decision Processes (MDPs) for a restricted class of problems. It is motivated by the Last Revealer Attack (LRA), which undermines fairness in some Proof-of-Stake (PoS) blockchains such as Ethereum (\$400B market capitalization). We introduce pseudo-MDPs (pMDPs) a framework that naturally models such problems and propose two distinct problem reductions to standard MDPs. One problem reduction provides a novel, counter-intuitive perspective, and combining the two problem reductions enables significant improvements in dynamic programming algorithms such as value iteration. In the case of the LRA which size is parameterized by $\kappa$ (in Ethereum's case $\kappa$= 325), we reduce the computational complexity from $O(2^\kappa \kappa^{2^{\kappa+2}})$ to $O(\kappa^4)$ (per iteration). This solution also provide the usual benefits from Dynamic Programming solutions: exponentially fast convergence toward the optimal solution is guaranteed. The dual perspective also simplifies policy extraction, making the approach well-suited for resource-constrained agents who can operate with very limited memory and computation once the problem has been solved. Furthermore, we generalize those results to a broader class of MDPs, enhancing their applicability. The framework is validated through two case studies: a fictional card game and the LRA on the Ethereum random seed consensus protocol. These applications demonstrate the framework's ability to solve large-scale problems effectively while offering actionable insights into optimal strategies. This work advances the study of MDPs and contributes to understanding security vulnerabilities in blockchain systems.
Updated: 2025-10-08 14:39:20
标题: 伪MDPs:一种在区块链中高效优化最后揭示者种子操作的新框架
摘要: 这项研究解决了解决马尔可夫决策过程(MDPs)的计算挑战,针对一类受限问题。该研究的动机是最后揭示者攻击(LRA),该攻击破坏了某些权益证明(PoS)区块链的公平性,例如以太坊(市值4000亿美元)。我们引入伪MDPs(pMDPs)框架,自然地对这类问题进行建模,并提出两个与标准MDPs有明显不同的问题简化方法。一个问题简化方法提供了一种新颖的、违反直觉的视角,将这两种问题简化方法结合起来可以显著改进动态规划算法,如价值迭代。在LRA的情况下,其大小由参数$\kappa$(以太坊的情况下$\kappa$= 325)来确定,我们将计算复杂度从$O(2^\kappa \kappa^{2^{\kappa+2}})$简化为$O(\kappa^4)$(每次迭代)。这种解决方案还提供了动态规划解决方案通常的好处:可以保证朝着最优解的指数级快速收敛。双重视角还简化了策略提取,使这种方法非常适合资源受限的代理人,在问题解决后可以仅使用非常有限的内存和计算资源运行。此外,我们将这些结果推广到更广泛的MDPs类别,增强了它们的适用性。该框架通过两个案例研究进行了验证:一个虚构的纸牌游戏和以太坊随机种子共识协议上的LRA。这些应用展示了该框架有效解决大规模问题的能力,同时提供了关于最优策略的可操作见解。这项工作推动了对MDPs的研究,并有助于理解区块链系统中的安全漏洞。
更新时间: 2025-10-08 14:39:20
领域: cs.CR,cs.LG
Vision-Language-Action Models for Robotics: A Review Towards Real-World Applications
Amid growing efforts to leverage advances in large language models (LLMs) and vision-language models (VLMs) for robotics, Vision-Language-Action (VLA) models have recently gained significant attention. By unifying vision, language, and action data at scale, which have traditionally been studied separately, VLA models aim to learn policies that generalise across diverse tasks, objects, embodiments, and environments. This generalisation capability is expected to enable robots to solve novel downstream tasks with minimal or no additional task-specific data, facilitating more flexible and scalable real-world deployment. Unlike previous surveys that focus narrowly on action representations or high-level model architectures, this work offers a comprehensive, full-stack review, integrating both software and hardware components of VLA systems. In particular, this paper provides a systematic review of VLAs, covering their strategy and architectural transition, architectures and building blocks, modality-specific processing techniques, and learning paradigms. In addition, to support the deployment of VLAs in real-world robotic applications, we also review commonly used robot platforms, data collection strategies, publicly available datasets, data augmentation methods, and evaluation benchmarks. Throughout this comprehensive survey, this paper aims to offer practical guidance for the robotics community in applying VLAs to real-world robotic systems. All references categorized by training approach, evaluation method, modality, and dataset are available in the table on our project website: https://vla-survey.github.io .
Updated: 2025-10-08 14:38:25
标题: 视觉-语言-动作模型在机器人技术中的应用:面向真实世界应用的综述
摘要: 随着利用大型语言模型(LLMs)和视觉语言模型(VLMs)在机器人领域取得的进展日益增加,视觉-语言-动作(VLA)模型近期引起了重要关注。通过在规模上统一传统上被单独研究的视觉、语言和动作数据,VLA模型旨在学习能够横跨不同任务、物体、实体和环境的策略,以实现泛化能力。这种泛化能力预计将使机器人能够在最少或没有额外任务特定数据的情况下解决新领域任务,促进更加灵活和可伸缩的真实世界部署。与以往狭窄关注动作表示或高层模型架构的调查不同,本文提供了全面的、全栈的审查,整合了VLA系统的软件和硬件组件。具体而言,本文提供了VLAs的系统审查,涵盖了它们的策略和架构转换、架构和构建块、模态特定处理技术和学习范式。此外,为了支持VLAs在真实世界机器人应用中的部署,我们还审查了常用的机器人平台、数据收集策略、公开可用数据集、数据增强方法和评估基准。通过这一全面的调查,本文旨在为机器人学术界在将VLAs应用于真实世界机器人系统中提供实用指导。所有按训练方法、评估方法、模态和数据集分类的参考文献均可在我们项目网站的表格中找到:https://vla-survey.github.io。
更新时间: 2025-10-08 14:38:25
领域: cs.RO,cs.AI,cs.CV,cs.LG
Blind Construction of Angular Power Maps in Massive MIMO Networks
Channel state information (CSI) acquisition is a challenging problem in massive multiple-input multiple-output (MIMO) networks. Radio maps provide a promising solution for radio resource management by reducing online CSI acquisition. However, conventional approaches for radio map construction require location-labeled CSI data, which is challenging in practice. This paper investigates unsupervised angular power map construction based on large timescale CSI data collected in a massive MIMO network without location labels. A hidden Markov model (HMM) is built to connect the hidden trajectory of a mobile with the CSI evolution of a massive MIMO channel. As a result, the mobile location can be estimated, enabling the construction of an angular power map. We show that under uniform rectilinear mobility with Poisson-distributed base stations (BSs), the Cramer-Rao Lower Bound (CRLB) for localization error can vanish at any signal-to-noise ratios (SNRs), whereas when BSs are confined to a limited region, the error remains nonzero even with infinite independent measurements. Based on reference signal received power (RSRP) data collected in a real multi-cell massive MIMO network, an average localization error of 18 meters can be achieved although measurements are mainly obtained from a single serving cell.
Updated: 2025-10-08 14:32:53
标题: 大规模MIMO网络中角度功率图的盲构建
摘要: 信道状态信息(CSI)获取是大规模多输入多输出(MIMO)网络中的一个挑战性问题。无线电地图为通过减少在线CSI获取来实现无线资源管理提供了一个有前途的解决方案。然而,传统的无线电地图构建方法需要带有位置标签的CSI数据,这在实践中是具有挑战性的。本文研究了基于大规模MIMO网络中收集的无位置标签CSI数据的无监督角功率地图构建。建立了一个隐藏马尔可夫模型(HMM)来连接移动设备的隐藏轨迹和大规模MIMO信道的CSI演变。因此,可以估计移动设备的位置,从而实现角功率地图的构建。我们表明,在具有泊松分布基站(BSs)的均匀直线移动情况下,定位误差的克拉默-拉奥下界(CRLB)可以在任何信噪比(SNR)下消失,而当BSs限制在有限区域时,即使进行了无限独立测量,误差仍然不为零。根据在真实多小区大规模MIMO网络中收集的参考信号接收功率(RSRP)数据,尽管主要来自单个服务小区的测量,平均定位误差可以达到18米。
更新时间: 2025-10-08 14:32:53
领域: cs.LG
360-LLaMA-Factory: Plug & Play Sequence Parallelism for Long Post-Training
Adding sequence parallelism into LLaMA-Factory, we open-sourced 360-LLaMA-Factory at https://github.com/Qihoo360/360-LLaMA-Factory. 360-LLaMA-Factory has received wide recognition and used in models such as Light-R1 arXiv:2503.10460, TinyR1 arXiv:2503.04872, Kaggle AIMO math models and also in large companies' training frameworks. This technical report delves deeper into the different sequence parallel modes behind 360-LLaMA-Factory and discusses our implementation insights.
Updated: 2025-10-08 14:32:07
标题: 360-LLaMA-Factory:长期训练的即插即用序列并行化
摘要: 将序列并行性引入LLaMA-Factory中,我们在https://github.com/Qihoo360/360-LLaMA-Factory上开源了360-LLaMA-Factory。360-LLaMA-Factory已经得到广泛认可,并在诸如Light-R1 arXiv:2503.10460、TinyR1 arXiv:2503.04872、Kaggle AIMO数学模型以及大型公司的训练框架中使用。本技术报告深入探讨360-LLaMA-Factory背后不同的序列并行模式,并讨论我们的实现见解。
更新时间: 2025-10-08 14:32:07
领域: cs.CL,cs.LG
Data-Driven Adaptive PID Control Based on Physics-Informed Neural Networks
This article proposes a data-driven PID controller design based on the principle of adaptive gain optimization, leveraging Physics-Informed Neural Networks (PINNs) generated for predictive modeling purposes. The proposed control design method utilizes gradients of the PID gain optimization, achieved through the automatic differentiation of PINNs, to apply model predictive control using a cost function based on tracking error and control inputs. By optimizing PINNs-based PID gains, the method achieves adaptive gain tuning that ensures stability while accounting for system nonlinearities. The proposed method features a systematic framework for integrating PINNs-based models of dynamical control systems into closed-loop control systems, enabling direct application to PID control design. A series of numerical experiments is conducted to demonstrate the effectiveness of the proposed method from the control perspectives based on both time and frequency domains.
Updated: 2025-10-08 14:27:34
标题: 基于物理信息神经网络的数据驱动自适应PID控制
摘要: 本文提出了一种基于自适应增益优化原理的数据驱动PID控制器设计,利用为预测建模目的生成的物理信息神经网络(PINNs)。所提出的控制设计方法利用PID增益优化的梯度,通过PINNs的自动微分实现模型预测控制,使用基于跟踪误差和控制输入的成本函数。通过优化基于PINNs的PID增益,该方法实现了确保稳定性并考虑系统非线性的自适应增益调整。所提出的方法具有将基于PINNs的动态控制系统模型整合到闭环控制系统中的系统框架,从而可以直接应用于PID控制设计。进行了一系列数值实验,以展示所提出的方法从时间和频率域的控制角度的有效性。
更新时间: 2025-10-08 14:27:34
领域: eess.SY,cs.LG,cs.SY
Introspection in Learned Semantic Scene Graph Localisation
This work investigates how semantics influence localisation performance and robustness in a learned self-supervised, contrastive semantic localisation framework. After training a localisation network on both original and perturbed maps, we conduct a thorough post-hoc introspection analysis to probe whether the model filters environmental noise and prioritises distinctive landmarks over routine clutter. We validate various interpretability methods and present a comparative reliability analysis. Integrated gradients and Attention Weights consistently emerge as the most reliable probes of learned behaviour. A semantic class ablation further reveals an implicit weighting in which frequent objects are often down-weighted. Overall, the results indicate that the model learns noise-robust, semantically salient relations about place definition, thereby enabling explainable registration under challenging visual and structural variations.
Updated: 2025-10-08 14:21:45
标题: 学习语义场景图定位中的内省
摘要: 这项工作研究了语义如何影响学习的自监督、对比语义定位框架中的定位性能和鲁棒性。在对原始地图和扰动地图上训练了一个定位网络后,我们进行了彻底的事后内省分析,以探究模型是否过滤环境噪声并优先考虑独特的地标而不是常规的杂乱物。我们验证了各种可解释性方法,并提出了一项比较可靠性分析。整合梯度和注意权重一直被证明是对学习行为最可靠的探测器。语义类别消融进一步揭示了一种隐含的加权方式,常见的对象往往被降权。总体而言,结果表明该模型学习了关于地点定义的噪声鲁棒、语义显著的关系,从而使得在具有挑战性的视觉和结构变化下能够进行可解释的注册。
更新时间: 2025-10-08 14:21:45
领域: cs.LG,cs.AI,cs.CV,cs.RO,I.2.10; I.2.9; I.4.8; I.5.2; I.5.1
Enhancing Speech Emotion Recognition via Fine-Tuning Pre-Trained Models and Hyper-Parameter Optimisation
We propose a workflow for speech emotion recognition (SER) that combines pre-trained representations with automated hyperparameter optimisation (HPO). Using SpeechBrain wav2vec2-base model fine-tuned on IEMOCAP as the encoder, we compare two HPO strategies, Gaussian Process Bayesian Optimisation (GP-BO) and Tree-structured Parzen Estimators (TPE), under an identical four-dimensional search space and 15-trial budget, with balanced class accuracy (BCA) on the German EmoDB corpus as the objective. All experiments run on 8 CPU cores with 32 GB RAM. GP-BO achieves 0.96 BCA in 11 minutes, and TPE (Hyperopt implementation) attains 0.97 in 15 minutes. In contrast, grid search requires 143 trials and 1,680 minutes to exceed 0.9 BCA, and the best AutoSpeech 2020 baseline reports only 0.85 in 30 minutes on GPU. For cross-lingual generalisation, an EmoDB-trained HPO-tuned model improves zero-shot accuracy by 0.25 on CREMA-D and 0.26 on RAVDESS. Results show that efficient HPO with pre-trained encoders delivers competitive SER on commodity CPUs. Source code to this work is available at: https://github.com/youngaryan/speechbrain-emotion-hpo.
Updated: 2025-10-08 14:20:43
标题: 通过微调预训练模型和超参数优化提升语音情感识别
摘要: 我们提出了一种语音情感识别(SER)的工作流程,将预训练表示与自动化超参数优化(HPO)相结合。使用在IEMOCAP上微调的SpeechBrain wav2vec2-base模型作为编码器,我们比较了两种HPO策略,高斯过程贝叶斯优化(GP-BO)和树状Parzen估计器(TPE),在相同的四维搜索空间和15次试验预算下,以德国EmoDB语料库上平衡类准确率(BCA)作为目标。所有实验在8个CPU核心和32 GB RAM上运行。GP-BO在11分钟内实现了0.96的BCA,而TPE(Hyperopt实现)在15分钟内达到了0.97。相比之下,网格搜索需要143次试验和1,680分钟才能超过0.9的BCA,而最佳的AutoSpeech 2020基线在GPU上30分钟内仅报告了0.85。对于跨语言的泛化,一个在EmoDB上训练的HPO调优模型在CREMA-D上提高了0.25的零样本准确率,在RAVDESS上提高了0.26。结果表明,具有预训练编码器的高效HPO在普通CPU上提供了有竞争力的SER。这项工作的源代码可在以下链接找到:https://github.com/youngaryan/speechbrain-emotion-hpo。
更新时间: 2025-10-08 14:20:43
领域: cs.LG
COMPASS: A Multi-Turn Benchmark for Tool-Mediated Planning & Preference Optimization
Real-world large language model (LLM) agents must master strategic tool use and user preference optimization through multi-turn interactions to assist users with complex planning tasks. We introduce COMPASS (Constrained Optimization through Multi-turn Planning and Strategic Solutions), a benchmark that evaluates agents on realistic travel-planning scenarios. We cast travel planning as a constrained preference optimization problem, where agents must satisfy hard constraints while simultaneously optimizing soft user preferences. To support this, we build a realistic travel database covering transportation, accommodation, and ticketing for 20 U.S. National Parks, along with a comprehensive tool ecosystem that mirrors commercial booking platforms. Evaluating state-of-the-art models, we uncover two critical gaps: (i) an acceptable-optimal gap, where agents reliably meet constraints but fail to optimize preferences, and (ii) a plan-coordination gap, where performance collapses on multi-service (flight and hotel) coordination tasks, especially for open-source models. By grounding reasoning and planning in a practical, user-facing domain, COMPASS provides a benchmark that directly measures an agent's ability to optimize user preferences in realistic tasks, bridging theoretical advances with real-world impact.
Updated: 2025-10-08 14:09:46
标题: COMPASS:面向工具介入规划和偏好优化的多轮基准测试
摘要: 真实世界的大型语言模型(LLM)代理必须通过多轮交互来掌握战略工具使用和用户偏好优化,以帮助用户完成复杂的规划任务。我们引入了COMPASS(通过多轮规划和战略解决方案进行约束优化),这是一个评估代理在现实旅行规划场景中的基准。我们将旅行规划作为一个受限偏好优化问题,代理必须在同时优化软用户偏好的同时满足硬性约束。为了支持这一点,我们建立了一个涵盖20个美国国家公园的交通、住宿和购票信息的真实旅行数据库,以及一个反映商业预订平台的综合工具生态系统。通过评估最先进的模型,我们发现两个关键差距:(i)可接受的-最优差距,代理可靠地满足约束条件但未能优化偏好;(ii)计划协调差距,在多服务(飞行和酒店)协调任务中性能下降,尤其是对于开源模型。通过将推理和规划基于实际、面向用户的领域,COMPASS提供了一个基准,直接衡量代理在现实任务中优化用户偏好的能力,将理论进步与实际影响联系起来。
更新时间: 2025-10-08 14:09:46
领域: cs.LG
AdaDim: Dimensionality Adaptation for SSL Representational Dynamics
A key factor in effective Self-Supervised learning (SSL) is preventing dimensional collapse, where higher-dimensional representation spaces ($R$) span a lower-dimensional subspace. Therefore, SSL optimization strategies involve guiding a model to produce $R$ with a higher dimensionality ($H(R)$) through objectives that encourage decorrelation of features or sample uniformity in $R$. A higher $H(R)$ indicates that $R$ has greater feature diversity which is useful for generalization to downstream tasks. Alongside dimensionality optimization, SSL algorithms also utilize a projection head that maps $R$ into an embedding space $Z$. Recent work has characterized the projection head as a filter of noisy or irrelevant features from the SSL objective by reducing the mutual information $I(R;Z)$. Therefore, the current literature's view is that a good SSL representation space should have a high $H(R)$ and a low $I(R;Z)$. However, this view of SSL is lacking in terms of an understanding of the underlying training dynamics that influences the relationship between both terms. Our analysis shows that the best performing SSL models do not have the highest $H(R)$ nor the lowest $I(R;Z)$, but effectively arrive at a balance between both. To take advantage of this analysis, we introduce AdaDim, a training strategy that leverages SSL training dynamics by adaptively balancing between increasing $H(R)$ through feature decorrelation and sample uniformity as well as gradual regularization of $I(R;Z)$ as training progresses. We show performance improvements of up to 3% over common SSL baselines despite our method not utilizing expensive techniques such as queues, clustering, predictor networks, or student-teacher architectures.
Updated: 2025-10-08 14:09:23
标题: AdaDim:SSL表征动态的维度适应
摘要: 一个有效的自监督学习(SSL)中的关键因素是防止维度坍缩,即高维表示空间($R$)跨越一个低维子空间。因此,SSL优化策略涉及引导模型通过鼓励特征去相关或在$R$中样本均匀性的目标来生成具有更高维度($H(R)$)的$R$。较高的$H(R)$表示$R$具有更大的特征多样性,这对泛化到下游任务是有用的。除了维度优化外,SSL算法还利用一个将$R$映射到嵌入空间$Z$的投影头。最近的研究将投影头描述为通过减少互信息$I(R;Z)$来过滤SSL目标中的嘈杂或无关特征。因此,当前文献认为一个良好的SSL表示空间应具有高$H(R)$和低$I(R;Z)$。然而,这种对SSL的看法在理解影响这两个术语之间关系的基础训练动态方面存在不足。我们的分析表明,表现最佳的SSL模型既不具有最高的$H(R)$,也不具有最低的$I(R;Z)$,而是在两者之间有效地取得平衡。为了利用这一分析,我们引入了AdaDim,一种利用自适应平衡增加$H(R)$和逐渐对$I(R;Z)$进行正则化的SSL训练策略。尽管我们的方法没有利用昂贵的技术,如队列、聚类、预测网络或师生架构,但我们展示了与常见的SSL基线相比高达3%的性能改进。
更新时间: 2025-10-08 14:09:23
领域: cs.LG,cs.AI
Enhancing Generative Auto-bidding with Offline Reward Evaluation and Policy Search
Auto-bidding serves as a critical tool for advertisers to improve their advertising performance. Recent progress has demonstrated that AI-Generated Bidding (AIGB), which learns a conditional generative planner from offline data, achieves superior performance compared to typical offline reinforcement learning (RL)-based auto-bidding methods. However, existing AIGB methods still face a performance bottleneck due to their inherent inability to explore beyond the static offline dataset. To address this, we propose {AIGB-Pearl} (\emph{{P}lanning with {E}valu{A}tor via RL}), a novel method that integrates generative planning and policy optimization. The core of AIGB-Pearl lies in constructing a trajectory evaluator for scoring generation quality and designing a provably sound KL-Lipschitz-constrained score maximization scheme to ensure safe and efficient exploration beyond the offline dataset. A practical algorithm incorporating the synchronous coupling technique is further devised to ensure the model regularity required by the proposed scheme. Extensive experiments on both simulated and real-world advertising systems demonstrate the state-of-the-art performance of our approach.
Updated: 2025-10-08 14:06:32
标题: 通过离线奖励评估和策略搜索增强生成式自动出价
摘要: 自动投标是广告商改善广告表现的关键工具。最近的进展表明,从离线数据中学习条件生成规划器的人工智能生成竞价(AIGB)相比于典型的离线强化学习(RL)自动投标方法,表现更优越。然而,现有的AIGB方法仍然面临性能瓶颈,因为它们固有的无法探索超出静态离线数据集的能力。为了解决这个问题,我们提出了{AIGB-Pearl}(通过RL进行评估者的规划),这是一种集成生成规划和策略优化的新方法。AIGB-Pearl的核心在于构建一个轨迹评估器来评分生成质量,并设计一个经过证明的KL-Lipschitz约束评分最大化方案,以确保在离线数据集之外进行安全和高效的探索。进一步设计了一个实用算法,采用同步耦合技术,以确保所提出方案所需的模型规则性。对模拟和真实广告系统进行的大量实验表明了我们方法的最先进性能。
更新时间: 2025-10-08 14:06:32
领域: cs.LG,cs.AI
Grounding the Ungrounded: A Spectral-Graph Framework for Quantifying Hallucinations in Multimodal LLMs
Hallucinations in LLMs--especially in multimodal settings--undermine reliability. We present a rigorous, information-geometric framework in diffusion dynamics that quantifies hallucination in MLLMs: model outputs are embedded spectrally on multimodal graph Laplacians, and gaps to a truth manifold define a semantic-distortion metric. We derive Courant--Fischer bounds on a temperature-dependent hallucination energy and use RKHS eigenmodes to obtain modality-aware, interpretable measures that track evolution over prompts and time. This reframes hallucination as measurable and bounded, providing a principled basis for evaluation and mitigation.
Updated: 2025-10-08 14:06:26
标题: 接地未接地:一种用于量化多模式LLMs中幻觉的光谱图框架
摘要: 在LLMs中的幻觉,特别是在多模态环境中,会削弱可靠性。我们提出了一个严格的信息几何框架,用扩散动力学来量化MLLMs中的幻觉:模型输出被谱嵌入多模态图拉普拉斯上,与真实流形的差距定义了一个语义失真度量。我们推导了一个依赖于温度的幻觉能量的Courant-Fischer界限,并使用RKHS特征模式获得模态感知、可解释的度量,以跟踪随提示和时间的演变。这将幻觉重新框定为可测量的和有界的,为评估和缓解提供了一种合理的基础。
更新时间: 2025-10-08 14:06:26
领域: cs.LG,cs.AI,53B21, 46E22 (Primary), 68R10 (Secondary)
RespoDiff: Dual-Module Bottleneck Transformation for Responsible & Faithful T2I Generation
The rapid advancement of diffusion models has enabled high-fidelity and semantically rich text-to-image generation; however, ensuring fairness and safety remains an open challenge. Existing methods typically improve fairness and safety at the expense of semantic fidelity and image quality. In this work, we propose RespoDiff, a novel framework for responsible text-to-image generation that incorporates a dual-module transformation on the intermediate bottleneck representations of diffusion models. Our approach introduces two distinct learnable modules: one focused on capturing and enforcing responsible concepts, such as fairness and safety, and the other dedicated to maintaining semantic alignment with neutral prompts. To facilitate the dual learning process, we introduce a novel score-matching objective that enables effective coordination between the modules. Our method outperforms state-of-the-art methods in responsible generation by ensuring semantic alignment while optimizing both objectives without compromising image fidelity. Our approach improves responsible and semantically coherent generation by 20% across diverse, unseen prompts. Moreover, it integrates seamlessly into large-scale models like SDXL, enhancing fairness and safety. Code will be released upon acceptance.
Updated: 2025-10-08 14:03:31
标题: RespoDiff: 负责和忠实T2I生成的双模块瓶颈转换
摘要: 扩散模型的快速发展使得文本到图像生成变得高保真和语义丰富;然而,确保公平性和安全性仍然是一个开放性挑战。现有方法通常是以牺牲语义保真度和图像质量来改善公平性和安全性。在这项工作中,我们提出了RespoDiff,一个新颖的框架,用于负责任的文本到图像生成,它结合了扩散模型中间瓶颈表示的双模块转换。我们的方法引入了两个不同的可学习模块:一个专注于捕捉和强化负责任概念,如公平性和安全性,另一个专门用于保持与中性提示的语义对齐。为了促进双学习过程,我们引入了一个新颖的得分匹配目标,使得模块之间能够有效协调。我们的方法在负责任生成方面优于最先进的方法,确保语义对齐的同时优化两个目标,而不会牺牲图像的保真度。我们的方法通过各种未见提示,提高了20%的负责任和语义连贯生成。此外,它无缝集成到大规模模型如SDXL中,增强了公平性和安全性。代码将在接受后发布。
更新时间: 2025-10-08 14:03:31
领域: cs.CV,cs.LG
Unified Molecule Pre-training with Flexible 2D and 3D Modalities: Single and Paired Modality Integration
Molecular representation learning plays a crucial role in advancing applications such as drug discovery and material design. Existing work leverages 2D and 3D modalities of molecular information for pre-training, aiming to capture comprehensive structural and geometric insights. However, these methods require paired 2D and 3D molecular data to train the model effectively and prevent it from collapsing into a single modality, posing limitations in scenarios where a certain modality is unavailable or computationally expensive to generate. To overcome this limitation, we propose FlexMol, a flexible molecule pre-training framework that learns unified molecular representations while supporting single-modality input. Specifically, inspired by the unified structure in vision-language models, our approach employs separate models for 2D and 3D molecular data, leverages parameter sharing to improve computational efficiency, and utilizes a decoder to generate features for the missing modality. This enables a multistage continuous learning process where both modalities contribute collaboratively during training, while ensuring robustness when only one modality is available during inference. Extensive experiments demonstrate that FlexMol achieves superior performance across a wide range of molecular property prediction tasks, and we also empirically demonstrate its effectiveness with incomplete data. Our code and data are available at https://github.com/tewiSong/FlexMol.
Updated: 2025-10-08 14:02:51
标题: 统一分子预训练与灵活的二维和三维模态:单一和配对模态集成
摘要: 分子表示学习在推进药物发现和材料设计等应用方面起着至关重要的作用。现有工作利用分子信息的2D和3D模态进行预训练,旨在捕获全面的结构和几何见解。然而,这些方法需要成对的2D和3D分子数据来有效训练模型,防止模型陷入单一模态,这在某些模态不可用或计算昂贵的情况下存在限制。为了克服这一限制,我们提出了FlexMol,一种灵活的分子预训练框架,可以学习统一的分子表示,同时支持单一模态输入。具体而言,受到视觉语言模型中统一结构的启发,我们的方法使用单独的模型处理2D和3D分子数据,利用参数共享来提高计算效率,并利用解码器为缺失的模态生成特征。这使得在训练期间两种模态都能够共同贡献,同时在推理期间只有一种模态可用时保证鲁棒性。大量实验证明,FlexMol在各种分子属性预测任务中取得了优越的性能,并且我们还通过不完整数据在经验上证明了其有效性。我们的代码和数据可在https://github.com/tewiSong/FlexMol 上获取。
更新时间: 2025-10-08 14:02:51
领域: cs.LG,cs.AI
Maximising the Utility of Validation Sets for Imbalanced Noisy-label Meta-learning
Meta-learning is an effective method to handle imbalanced and noisy-label learning, but it depends on a validation set containing randomly selected, manually labelled and balanced distributed samples. The random selection and manual labelling and balancing of this validation set is not only sub-optimal for meta-learning, but it also scales poorly with the number of classes. Hence, recent meta-learning papers have proposed ad-hoc heuristics to automatically build and label this validation set, but these heuristics are still sub-optimal for meta-learning. In this paper, we analyse the meta-learning algorithm and propose new criteria to characterise the utility of the validation set, based on: 1) the informativeness of the validation set; 2) the class distribution balance of the set; and 3) the correctness of the labels of the set. Furthermore, we propose a new imbalanced noisy-label meta-learning (INOLML) algorithm that automatically builds a validation set by maximising its utility using the criteria above. Our method shows significant improvements over previous meta-learning approaches and sets the new state-of-the-art on several benchmarks.
Updated: 2025-10-08 13:56:57
标题: 最大化用于不平衡嘈杂标签元学习的验证集效用
摘要: 元学习是处理不平衡和嘈杂标签学习的有效方法,但它依赖于包含随机选取、手动标记和平衡分布样本的验证集。对于元学习,验证集的随机选择、手动标记和平衡不仅不是最佳选择,而且随着类别数量的增加,效果也变得不佳。因此,最近的元学习论文提出了自动构建和标记验证集的临时启发式方法,但这些启发式方法仍然不是最佳的元学习方法。本文分析了元学习算法,并提出了新的标准来表征验证集的效用,基于:1)验证集的信息量;2)集合的类别分布平衡性;和3)集合标签的正确性。此外,我们提出了一种新的不平衡嘈杂标签元学习(INOLML)算法,通过最大化上述标准来自动构建验证集,以提高其效用。我们的方法在几个基准测试中显著改进了以前的元学习方法,并树立了新的技术水平。
更新时间: 2025-10-08 13:56:57
领域: cs.LG,cs.CV
Universally Composable Termination Analysis of Tendermint
Modern blockchain systems operating in adversarial environments require robust consensus protocols that guarantee both safety and termination under network delay attacks. Tendermint, a widely adopted consensus protocol in consortium blockchains, achieves high throughput and finality. However, previous analysis of the safety and termination has been done in a standalone fashion, with no consideration of the composition with other protocols interacting with it in a concurrent manner. Moreover, the termination properties under adaptive network delays caused by Byzantine adversaries have not been formally analyzed. This paper presents the first universally composable (UC) security analysis of Tendermint, demonstrating its resilience against strategic message-delay attacks. By constructing a UC ideal model of Tendermint, we formalize its core mechanisms: phase-base consensus procedure, dynamic timeouts, proposal locking, leader rotation, and others, under a network adversary that selectively delays protocol messages. Our main result proves that the Tendermint protocol UC-realizes the ideal Tendermint model, which ensures bounded termination latency, i.e., guaranteed termination, even when up to $f<n/3$ nodes are Byzantine (where $n$ is the number of nodes participating in the consensus), provided that network delays remain within a protocol-defined threshold under the partially synchronous net assumption. Specifically, through formal proofs within the UC framework, we show that Tendermint maintains safety and termination. By the composition theorem of UC, this guarantees that these properties are maintained when Tendermint is composed with various blockchain components.
Updated: 2025-10-08 13:52:35
标题: Tendermint的普适可组合终止分析
摘要: 在对抗性环境中运行的现代区块链系统需要健壮的共识协议,以确保在网络延迟攻击下同时保证安全性和终止性。Tendermint是联盟区块链中广泛采用的共识协议,实现了高吞吐量和最终性。然而,以前对安全性和终止性的分析是以独立方式进行的,没有考虑与其他协议并发交互的情况。此外,由拜占庭对手引起的自适应网络延迟下的终止属性尚未得到正式分析。本文提出了Tendermint的第一个通用可组合(UC)安全性分析,展示了其对策略性消息延迟攻击的抵抗力。通过构建Tendermint的UC理想模型,我们正式化了其核心机制:基于阶段的共识程序、动态超时、提案锁定、领导者轮换等,在选择性延迟协议消息的网络对手的情况下。我们的主要结果证明了Tendermint协议UC实现了理想的Tendermint模型,这确保了有界的终止延迟,即在最多$f<n/3$个拜占庭节点存在的情况下(其中$n$是参与共识的节点数),只要网络延迟保持在部分同步网络假设下协议定义的阈值内。具体来说,通过UC框架内的形式证明,我们展示了Tendermint保持安全性和终止性。通过UC的组合定理,这确保了当Tendermint与各种区块链组件组合时这些属性得以维持。
更新时间: 2025-10-08 13:52:35
领域: cs.CR
Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models
Vision-language models (VLMs) trained on internet-scale data achieve remarkable zero-shot detection performance on common objects like car, truck, and pedestrian. However, state-of-the-art models still struggle to generalize to out-of-distribution classes, tasks and imaging modalities not typically found in their pre-training. Rather than simply re-training VLMs on more visual data, we argue that one should align VLMs to new concepts with annotation instructions containing a few visual examples and rich textual descriptions. To this end, we introduce Roboflow100-VL, a large-scale collection of 100 multi-modal object detection datasets with diverse concepts not commonly found in VLM pre-training. We evaluate state-of-the-art models on our benchmark in zero-shot, few-shot, semi-supervised, and fully-supervised settings, allowing for comparison across data regimes. Notably, we find that VLMs like GroundingDINO and Qwen2.5-VL achieve less than 2% zero-shot accuracy on challenging medical imaging datasets within Roboflow100-VL, demonstrating the need for few-shot concept alignment. Lastly, we discuss our recent CVPR 2025 Foundational FSOD competition and share insights from the community. Notably, the winning team significantly outperforms our baseline by 17 mAP! Our code and dataset are available at https://github.com/roboflow/rf100-vl and https://universe.roboflow.com/rf100-vl/.
Updated: 2025-10-08 13:51:05
标题: Roboflow100-VL:用于视觉语言模型的多领域目标检测基准测试
摘要: 视觉语言模型(VLMs)在互联网规模数据上训练,实现了对常见物体(如汽车、卡车和行人)的显著零样本检测性能。然而,最先进的模型仍然难以推广到其预训练中通常不会出现的超出分布类别、任务和成像方式。我们认为,与其简单地在更多视觉数据上重新训练VLMs,不如通过包含少量视觉示例和丰富的文本描述的注释指导来使VLMs与新概念保持一致。为此,我们引入了Roboflow100-VL,这是一个包含100个多模态目标检测数据集的大规模集合,其中包含了在VLM预训练中通常不常见的多样概念。我们在零样本、少样本、半监督和全监督设置下评估了最先进的模型在我们的基准测试中的表现,从而实现跨数据范围的比较。值得注意的是,我们发现像GroundingDINO和Qwen2.5-VL这样的VLMs在Roboflow100-VL中具有挑战性的医学成像数据集上的零样本准确率不到2%,这表明需要进行少样本概念对齐。最后,我们讨论了我们最近的CVPR 2025基础FSOD竞赛,并分享了社区的见解。值得注意的是,获胜团队的性能比我们的基准提高了17个mAP!我们的代码和数据集可在https://github.com/roboflow/rf100-vl和https://universe.roboflow.com/rf100-vl/上找到。
更新时间: 2025-10-08 13:51:05
领域: cs.CV,cs.CL,cs.LG
Error Bounds for Physics-Informed Neural Networks in Fokker-Planck PDEs
Stochastic differential equations are commonly used to describe the evolution of stochastic processes. The state uncertainty of such processes is best represented by the probability density function (PDF), whose evolution is governed by the Fokker-Planck partial differential equation (FP-PDE). However, it is generally infeasible to solve the FP-PDE in closed form. In this work, we show that physics-informed neural networks (PINNs) can be trained to approximate the solution PDF. Our main contribution is the analysis of PINN approximation error: we develop a theoretical framework to construct tight error bounds using PINNs. In addition, we derive a practical error bound that can be efficiently constructed with standard training methods. We discuss that this error-bound framework generalizes to approximate solutions of other linear PDEs. Empirical results on nonlinear, high-dimensional, and chaotic systems validate the correctness of our error bounds while demonstrating the scalability of PINNs and their significant computational speedup in obtaining accurate PDF solutions compared to the Monte Carlo approach.
Updated: 2025-10-08 13:47:39
标题: Fokker-Planck偏微分方程中物理启发神经网络的误差界限
摘要: 随机微分方程常用来描述随机过程的演化。这类过程的状态不确定性最好由概率密度函数(PDF)表示,其演化由福克-普朗克偏微分方程(FP-PDE)控制。然而,一般来说,解FP-PDE闭式解是不可行的。在本研究中,我们展示了物理启发式神经网络(PINNs)可以被训练来逼近解PDF。我们的主要贡献是对PINN逼近误差的分析:我们开发了一个理论框架,使用PINNs构建紧凑的误差界。此外,我们推导出一个可以通过标准训练方法有效构建的实际误差界。我们讨论了这种误差界框架如何泛化到其他线性PDE的近似解。对非线性、高维和混沌系统的实证结果验证了我们误差界的正确性,同时展示了PINNs的可扩展性及其相对于蒙特卡罗方法在获得准确PDF解时的显著计算加速。
更新时间: 2025-10-08 13:47:39
领域: cs.LG,cs.AI,cs.NA,math.NA,physics.comp-ph
Federated Unlearning in the Wild: Rethinking Fairness and Data Discrepancy
Machine unlearning is critical for enforcing data deletion rights like the "right to be forgotten." As a decentralized paradigm, Federated Learning (FL) also requires unlearning, but realistic implementations face two major challenges. First, fairness in Federated Unlearning (FU) is often overlooked. Exact unlearning methods typically force all clients into costly retraining, even those uninvolved. Approximate approaches, using gradient ascent or distillation, make coarse interventions that can unfairly degrade performance for clients with only retained data. Second, most FU evaluations rely on synthetic data assumptions (IID/non-IID) that ignore real-world heterogeneity. These unrealistic benchmarks obscure the true impact of unlearning and limit the applicability of current methods. We first conduct a comprehensive benchmark of existing FU methods under realistic data heterogeneity and fairness conditions. We then propose a novel, fairness-aware FU approach, Federated Cross-Client-Constrains Unlearning (FedCCCU), to explicitly address both challenges. FedCCCU offers a practical and scalable solution for real-world FU. Experimental results show that existing methods perform poorly in realistic settings, while our approach consistently outperforms them.
Updated: 2025-10-08 13:47:19
标题: 荒野中的联邦遗忘:重新思考公平性和数据差异
摘要: 机器遗忘对于强制执行数据删除权利如“被遗忘权”至关重要。作为一种分散式范式,联邦学习(FL)也需要遗忘,但实际的实施面临两个主要挑战。首先,在联邦遗忘(FU)中经常忽视公平性。精确的遗忘方法通常会强制所有客户端进行昂贵的重新训练,即使那些没有参与的客户端也是如此。使用梯度上升或精炼的近似方法可能会进行粗略的干预,这可能会不公平地降低仅保留数据的客户端的性能。其次,大多数FU评估依赖于假设的合成数据(IID/non-IID),忽视了真实世界的异质性。这些不现实的基准模糊了遗忘的真实影响,并限制了当前方法的适用性。我们首先在现实数据异质性和公平性条件下对现有的FU方法进行全面的基准测试。然后,我们提出一种新颖的、关注公平性的FU方法,即联邦交叉客户端约束遗忘(FedCCCU),以明确解决这两个挑战。FedCCCU为现实世界的FU提供了实用且可扩展的解决方案。实验结果表明,现有方法在现实环境中表现不佳,而我们的方法始终优于它们。
更新时间: 2025-10-08 13:47:19
领域: cs.LG,cs.AI
DiffMI: Breaking Face Recognition Privacy via Diffusion-Driven Training-Free Model Inversion
Face recognition poses serious privacy risks due to its reliance on sensitive and immutable biometric data. While modern systems mitigate privacy risks by mapping facial images to embeddings (commonly regarded as privacy-preserving), model inversion attacks reveal that identity information can still be recovered, exposing critical vulnerabilities. However, existing attacks are often computationally expensive and lack generalization, especially those requiring target-specific training. Even training-free approaches suffer from limited identity controllability, hindering faithful reconstruction of nuanced or unseen identities. In this work, we propose DiffMI, the first diffusion-driven, training-free model inversion attack. DiffMI introduces a novel pipeline combining robust latent code initialization, a ranked adversarial refinement strategy, and a statistically grounded, confidence-aware optimization objective. DiffMI applies directly to unseen target identities and face recognition models, offering greater adaptability than training-dependent approaches while significantly reducing computational overhead. Our method achieves 84.42%--92.87% attack success rates against inversion-resilient systems and outperforms the best prior training-free GAN-based approach by 4.01%--9.82%. The implementation is available at https://github.com/azrealwang/DiffMI.
Updated: 2025-10-08 13:46:41
标题: DiffMI:通过扩散驱动的无需训练的模型反演来突破人脸识别隐私
摘要: 人脸识别由于依赖于敏感且不可变的生物特征数据,存在严重的隐私风险。现代系统通过将面部图像映射到嵌入(通常被认为是保护隐私的)来减轻隐私风险,但模型反演攻击显示身份信息仍然可以被恢复,暴露了关键的漏洞。然而,现有的攻击通常计算成本高昂且缺乏泛化能力,尤其是那些需要特定目标训练的攻击。即使是无需训练的方法也存在身份可控性有限的问题,阻碍了对微妙或未知身份的忠实重建。在这项工作中,我们提出了DiffMI,这是第一个基于扩散驱动的、无需训练的模型反演攻击。DiffMI引入了一个新的流程,结合了强大的潜在代码初始化、一个排名对抗性改进策略,以及一个基于统计的、具有信心感知的优化目标。DiffMI可直接应用于未知目标身份和人脸识别模型,比依赖于训练的方法具有更大的适应性,同时显著减少计算开销。我们的方法在抗反演系统中取得了84.42%至92.87%的攻击成功率,并且优于先前最佳的无需训练的基于GAN的方法4.01%至9.82%。该实现可在https://github.com/azrealwang/DiffMI 上找到。
更新时间: 2025-10-08 13:46:41
领域: cs.CR,cs.CV,cs.LG
Native Hybrid Attention for Efficient Sequence Modeling
Transformers excel at sequence modeling but face quadratic complexity, while linear attention offers improved efficiency but often compromises recall accuracy over long contexts. In this work, we introduce Native Hybrid Attention (NHA), a novel hybrid architecture of linear and full attention that integrates both intra \& inter-layer hybridization into a unified layer design. NHA maintains long-term context in key-value slots updated by a linear RNN, and augments them with short-term tokens from a sliding window. A single \texttt{softmax attention} operation is then applied over all keys and values, enabling per-token and per-head context-dependent weighting without requiring additional fusion parameters. The inter-layer behavior is controlled through a single hyperparameter, the sliding window size, which allows smooth adjustment between purely linear and full attention while keeping all layers structurally uniform. Experimental results show that NHA surpasses Transformers and other hybrid baselines on recall-intensive and commonsense reasoning tasks. Furthermore, pretrained LLMs can be structurally hybridized with NHA, achieving competitive accuracy while delivering significant efficiency gains. Code is available at https://github.com/JusenD/NHA.
Updated: 2025-10-08 13:44:57
标题: 原生混合注意力用于高效序列建模
摘要: Transformer在序列建模方面表现出色,但面临二次复杂性,而线性注意力提供了改进的效率,但往往会在长文本上牺牲召回精度。在这项工作中,我们引入了原生混合注意力(NHA),这是一种线性和完全注意力的新型混合架构,将层内和层间混合化整合到统一的层设计中。NHA通过线性RNN更新的键-值槽中维护长期上下文,并利用滑动窗口中的短期标记进行增强。然后,在所有键和值上应用单个softmax注意力操作,实现每个标记和每个头部的上下文相关加权,无需额外的融合参数。通过一个超参数控制层间行为,即滑动窗口大小,可以在保持所有层结构统一的同时在纯线性和完全注意力之间实现平滑调整。实验结果表明,NHA在召回密集和常识推理任务上超越了Transformer和其他混合基线。此外,预训练的LLM可以与NHA结构混合,实现竞争性的准确性,同时带来显著的效率提升。代码可在https://github.com/JusenD/NHA找到。
更新时间: 2025-10-08 13:44:57
领域: cs.CL,cs.AI,cs.LG
Sharpness-Aware Data Generation for Zero-shot Quantization
Zero-shot quantization aims to learn a quantized model from a pre-trained full-precision model with no access to original real training data. The common idea in zero-shot quantization approaches is to generate synthetic data for quantizing the full-precision model. While it is well-known that deep neural networks with low sharpness have better generalization ability, none of the previous zero-shot quantization works considers the sharpness of the quantized model as a criterion for generating training data. This paper introduces a novel methodology that takes into account quantized model sharpness in synthetic data generation to enhance generalization. Specifically, we first demonstrate that sharpness minimization can be attained by maximizing gradient matching between the reconstruction loss gradients computed on synthetic and real validation data, under certain assumptions. We then circumvent the problem of the gradient matching without real validation set by approximating it with the gradient matching between each generated sample and its neighbors. Experimental evaluations on CIFAR-100 and ImageNet datasets demonstrate the superiority of the proposed method over the state-of-the-art techniques in low-bit quantization settings.
Updated: 2025-10-08 13:43:39
标题: 锐度感知数据生成用于零-shot 量化
摘要: 零-shot量化旨在从预训练的全精度模型中学习一个量化模型,而无需访问原始真实训练数据。零-shot量化方法的共同思想是为全精度模型生成合成数据进行量化。尽管众所周知,具有较低尖锐度的深度神经网络具有更好的泛化能力,但以往的零-shot量化工作中没有考虑量化模型的尖锐度作为生成训练数据的标准。本文介绍了一种新的方法论,该方法在合成数据生成中考虑了量化模型的尖锐度以增强泛化能力。具体来说,我们首先证明,通过在合成和真实验证数据上计算的重构损失梯度之间的梯度匹配,可以实现尖锐度最小化,假设一定条件成立。然后,我们通过使用每个生成样本和其邻居之间的梯度匹配来绕过梯度匹配问题,而不使用真实验证集。在CIFAR-100和ImageNet数据集上的实验评估表明,所提出的方法在低位量化设置中优于现有技术。
更新时间: 2025-10-08 13:43:39
领域: cs.LG,cs.CV
Minimal Cascade Gradient Smoothing for Fast Transferable Preemptive Adversarial Defense
Adversarial attacks persist as a major challenge in deep learning. While training- and test-time defenses are well-studied, they often reduce clean accuracy, incur high cost, or fail under adaptive threats. In contrast, preemptive defenses, which perturb media before release, offer a practical alternative but remain slow, model-coupled, and brittle. We propose the Minimal Sufficient Preemptive Defense (MSPD), a fast, transferable framework that defends against future attacks without access to the target model or gradients. MSPD is driven by Minimal Cascade Gradient Smoothing (MCGS), a two-epoch optimization paradigm executed on a surrogate backbone. This defines a minimal yet effective regime for robust generalization across unseen models and attacks. MSPD runs at 0.02s/image (CIFAR-10) and 0.26s/image (ImageNet), 28--1696x faster than prior preemptive methods, while improving robust accuracy by +5% and clean accuracy by +3.7% across 11 models and 7 attacks. To evaluate adaptive robustness, we introduce Preemptive Reversion, the first white-box diagnostic attack that cancels preemptive perturbations under full gradient access. Even in this setting, MSPD retains a +2.2% robustness margin over the baseline. In practice, when gradients are unavailable, MSPD remains reliable and efficient. MSPD, MCGS, and Preemptive Reversion are each supported by formal theoretical proofs. The implementation is available at https://github.com/azrealwang/MSPD.
Updated: 2025-10-08 13:35:28
标题: 最小级联梯度平滑用于快速可转移的抢占性对抗防御
摘要: 对抗性攻击仍然是深度学习中的一个重大挑战。虽然训练和测试时的防御机制已经得到深入研究,但它们往往会降低干净的准确性,产生高成本或在适应性威胁下失败。相比之下,预防性防御措施在发布前扰乱媒体,提供了一种实用的替代方案,但仍然缓慢、与模型耦合且脆弱。我们提出了最小必要的预防性防御(MSPD),这是一个快速、可转移的框架,可以在没有目标模型或梯度访问的情况下抵御未来的攻击。MSPD由最小级联梯度平滑(MCGS)驱动,这是在一个替代骨干上执行的两个时期的优化范式。这为跨越未见模型和攻击的强大泛化定义了一个最小而有效的制度。MSPD在每张图片上运行时为0.02秒(CIFAR-10)和0.26秒(ImageNet),比先前的预防性方法快28-1696倍,同时在11个模型和7种攻击中将鲁棒性准确性提高了5%,干净准确性提高了3.7%。为了评估自适应鲁棒性,我们引入了预防性复原,这是第一个在完全梯度访问下取消预防性扰动的白盒诊断攻击。即使在这种情况下,MSPD仍然比基线保留了+2.2%的鲁棒性边缘。在实践中,当梯度不可用时,MSPD仍然可靠和高效。MSPD、MCGS和预防性复原均受到正式的理论证明支持。实现可在https://github.com/azrealwang/MSPD上找到。
更新时间: 2025-10-08 13:35:28
领域: cs.CR
Learning to Recover: Dynamic Reward Shaping with Wheel-Leg Coordination for Fallen Robots
Adaptive recovery from fall incidents are essential skills for the practical deployment of wheeled-legged robots, which uniquely combine the agility of legs with the speed of wheels for rapid recovery. However, traditional methods relying on preplanned recovery motions, simplified dynamics or sparse rewards often fail to produce robust recovery policies. This paper presents a learning-based framework integrating Episode-based Dynamic Reward Shaping and curriculum learning, which dynamically balances exploration of diverse recovery maneuvers with precise posture refinement. An asymmetric actor-critic architecture accelerates training by leveraging privileged information in simulation, while noise-injected observations enhance robustness against uncertainties. We further demonstrate that synergistic wheel-leg coordination reduces joint torque consumption by 15.8% and 26.2% and improves stabilization through energy transfer mechanisms. Extensive evaluations on two distinct quadruped platforms achieve recovery success rates up to 99.1% and 97.8% without platform-specific tuning. The supplementary material is available at https://boyuandeng.github.io/L2R-WheelLegCoordination/
Updated: 2025-10-08 13:33:28
标题: 学习恢复:借助轮腿协调的动态奖励塑造来帮助倒下的机器人
摘要: 摔倒事件的自适应恢复是轮腿机器人实际部署的关键技能,这些机器人独特地将腿部的灵活性与轮子的速度结合,以实现快速恢复。然而,传统方法依赖于预先规划的恢复动作、简化的动力学或稀疏的奖励往往无法产生稳健的恢复策略。本文提出了一个基于学习的框架,集成了基于情节的动态奖励塑造和课程学习,动态平衡了对多样化恢复机动的探索和对姿势精细调整的需求。一个不对称的演员-评论家架构通过利用模拟中的特权信息来加速训练,同时注入噪声的观测增强了对不确定性的稳健性。我们进一步证明了协调轮腿的协同作用可以将关节扭矩消耗降低15.8%和26.2%,并通过能量传递机制改善稳定性。在两个不同的四足平台上进行了广泛的评估,恢复成功率高达99.1%和97.8%,无需特定平台的调整。补充材料可在https://boyuandeng.github.io/L2R-WheelLegCoordination/ 上找到。
更新时间: 2025-10-08 13:33:28
领域: cs.RO,cs.AI,cs.LG
Interpretable Robot Control via Structured Behavior Trees and Large Language Models
As intelligent robots become more integrated into human environments, there is a growing need for intuitive and reliable Human-Robot Interaction (HRI) interfaces that are adaptable and more natural to interact with. Traditional robot control methods often require users to adapt to interfaces or memorize predefined commands, limiting usability in dynamic, unstructured environments. This paper presents a novel framework that bridges natural language understanding and robotic execution by combining Large Language Models (LLMs) with Behavior Trees. This integration enables robots to interpret natural language instructions given by users and translate them into executable actions by activating domain-specific plugins. The system supports scalable and modular integration, with a primary focus on perception-based functionalities, such as person tracking and hand gesture recognition. To evaluate the system, a series of real-world experiments was conducted across diverse environments. Experimental results demonstrate that the proposed approach is practical in real-world scenarios, with an average cognition-to-execution accuracy of approximately 94%, making a significant contribution to HRI systems and robots. The complete source code of the framework is publicly available at https://github.com/snt-arg/robot_suite.
Updated: 2025-10-08 13:28:52
标题: 通过结构化行为树和大型语言模型实现可解释的机器人控制
摘要: 随着智能机器人逐渐融入人类环境,对直观可靠的人机交互(HRI)界面的需求日益增长,这些界面需要具有适应性和更自然的交互方式。传统的机器人控制方法通常要求用户适应界面或记忆预定义命令,从而限制了在动态、非结构化环境中的可用性。本文提出了一个新颖的框架,通过将大型语言模型(LLMs)与行为树结合,构建了自然语言理解和机器人执行之间的桥梁。这种集成使机器人能够解释用户给出的自然语言指令,并通过激活特定领域插件将其转化为可执行动作。该系统支持可伸缩和模块化集成,主要关注基于感知的功能,如人员追踪和手势识别。为了评估该系统,进行了一系列在不同环境中的真实实验。实验结果表明,所提出的方法在现实场景中是实用的,平均认知到执行的准确率约为94%,为HRI系统和机器人做出了重要贡献。该框架的完整源代码可在https://github.com/snt-arg/robot_suite 上公开获取。
更新时间: 2025-10-08 13:28:52
领域: cs.RO,cs.AI,cs.LG
GreedyPixel: Fine-Grained Black-Box Adversarial Attack Via Greedy Algorithm
Deep neural networks are highly vulnerable to adversarial examples that inputs with small, carefully crafted perturbations that cause misclassification, making adversarial attacks an essential tool for robustness evaluation. Existing black-box attacks fall into three categories: query-only, transfer-only, and query-and-transfer, and vary in perturbation pattern and optimization strategy. However, no prior method jointly achieves query-and-transfer guidance, pixel-wise sparsity, and training-free direct optimization, leaving a gap between black-box flexibility and white-box precision. We present GreedyPixel, a new attack framework that fills this gap by combining a surrogate-derived pixel priority map with greedy, per-pixel optimization refined by query feedback. This design reduces the exponential brute-force search space to a tractable linear procedure, guarantees monotonic loss decrease and convergence to a coordinate-wise optimum, and concentrates perturbations on robust, semantically meaningful pixels to improve perceptual quality. Extensive experiments on CIFAR-10 and ImageNet under both white-box and black-box settings demonstrate that GreedyPixel achieves state-of-the-art attack success rates and produces visually imperceptible perturbations. Our results show that GreedyPixel bridges the precision gap between white-box and black-box attacks and provides a practical framework for fine-grained robustness evaluation. The implementation is available at https://github.com/azrealwang/greedypixel.
Updated: 2025-10-08 13:27:03
标题: GreedyPixel:通过贪婪算法进行细粒度黑盒对抗攻击
摘要: 深度神经网络对于对抗性示例非常脆弱,即通过微小、精心设计的扰动使输入被错误分类,使对抗攻击成为鲁棒性评估的重要工具。现有的黑盒攻击可分为三类:仅查询、仅转移和查询与转移,并且在扰动模式和优化策略上有所不同。然而,以往的方法没有同时实现查询与转移引导、像素级稀疏性和无需训练的直接优化,导致黑盒灵活性与白盒精度之间存在差距。我们提出了GreedyPixel,这是一个新的攻击框架,通过将由替代模型导出的像素优先级图与贪婪、逐像素优化相结合,并通过查询反馈进行优化。这种设计将指数级的暴力搜索空间减少为可处理的线性过程,保证了损失的单调减少,收敛到逐坐标最优解,将扰动集中在稳健、语义有意义的像素上,以提高感知质量。在CIFAR-10和ImageNet上进行的广泛实验表明,GreedyPixel在白盒和黑盒设置下都取得了最先进的攻击成功率,并产生了视觉上无法察觉的扰动。我们的结果表明,GreedyPixel填补了白盒和黑盒攻击之间的精度差距,并为细粒度的鲁棒性评估提供了一个实用的框架。实现可在https://github.com/azrealwang/greedypixel找到。
更新时间: 2025-10-08 13:27:03
领域: cs.CV,cs.CR,cs.LG
Root Cause Analysis of Outliers in Unknown Cyclic Graphs
We study the propagation of outliers in cyclic causal graphs with linear structural equations, tracing them back to one or several "root cause" nodes. We show that it is possible to identify a short list of potential root causes provided that the perturbation is sufficiently strong and propagates according to the same structural equations as in the normal mode. This shortlist consists of the true root causes together with those of its parents lying on a cycle with the root cause. Notably, our method does not require prior knowledge of the causal graph.
Updated: 2025-10-08 13:19:01
标题: 未知循环图中离群值的根本原因分析
摘要: 我们研究了具有线性结构方程的循环因果图中异常值的传播,并将它们追溯到一个或多个“根本原因”节点。我们表明,只要扰动足够强,且按照与正常模式相同的结构方程传播,就有可能确定潜在的根本原因的一个简短列表。这个简短列表包括真正的根本原因以及在与根本原因形成循环的父节点。值得注意的是,我们的方法不需要先前对因果图的了解。
更新时间: 2025-10-08 13:19:01
领域: stat.ML,cs.LG,stat.ME
RedTWIZ: Diverse LLM Red Teaming via Adaptive Attack Planning
This paper presents the vision, scientific contributions, and technical details of RedTWIZ: an adaptive and diverse multi-turn red teaming framework, to audit the robustness of Large Language Models (LLMs) in AI-assisted software development. Our work is driven by three major research streams: (1) robust and systematic assessment of LLM conversational jailbreaks; (2) a diverse generative multi-turn attack suite, supporting compositional, realistic and goal-oriented jailbreak conversational strategies; and (3) a hierarchical attack planner, which adaptively plans, serializes, and triggers attacks tailored to specific LLM's vulnerabilities. Together, these contributions form a unified framework -- combining assessment, attack generation, and strategic planning -- to comprehensively evaluate and expose weaknesses in LLMs' robustness. Extensive evaluation is conducted to systematically assess and analyze the performance of the overall system and each component. Experimental results demonstrate that our multi-turn adversarial attack strategies can successfully lead state-of-the-art LLMs to produce unsafe generations, highlighting the pressing need for more research into enhancing LLM's robustness.
Updated: 2025-10-08 13:18:42
标题: RedTWIZ:通过自适应攻击规划实现多样化的LLM红队行动
摘要: 本文介绍了RedTWIZ的愿景、科学贡献和技术细节:RedTWIZ是一种自适应和多样化的多轮红队框架,用于审计人工智能辅助软件开发中大型语言模型(LLMs)的稳健性。我们的工作受到三个主要研究领域的驱动:(1)对LLM会话破解的稳健和系统化评估;(2)支持构成性、真实和目标导向破解会话策略的多样化生成式多轮攻击套件;和(3)层次化攻击规划器,自适应地规划、序列化和触发针对特定LLM漏洞的攻击。这些贡献共同构成了一个统一框架——结合评估、攻击生成和战略规划——全面评估和暴露LLM稳健性的弱点。我们进行了广泛的评估,系统评估和分析了整个系统和每个组件的性能。实验结果表明,我们的多轮对抗攻击策略可以成功地引导最先进的LLM生成不安全的内容,突显了更多研究以增强LLM稳健性的紧迫需要。
更新时间: 2025-10-08 13:18:42
领域: cs.CR,cs.CL
A Differentiable Alignment Framework for Sequence-to-Sequence Modeling via Optimal Transport
Accurate sequence-to-sequence (seq2seq) alignment is critical for applications like medical speech analysis and language learning tools relying on automatic speech recognition (ASR). State-of-the-art end-to-end (E2E) ASR systems, such as the Connectionist Temporal Classification (CTC) and transducer-based models, suffer from peaky behavior and alignment inaccuracies. In this paper, we propose a novel differentiable alignment framework based on one-dimensional optimal transport, enabling the model to learn a single alignment and perform ASR in an E2E manner. We introduce a pseudo-metric, called Sequence Optimal Transport Distance (SOTD), over the sequence space and discuss its theoretical properties. Based on the SOTD, we propose Optimal Temporal Transport Classification (OTTC) loss for ASR and contrast its behavior with CTC. Experimental results on the TIMIT, AMI, and LibriSpeech datasets show that our method considerably improves alignment performance compared to CTC and the more recently proposed Consistency-Regularized CTC, though with a trade-off in ASR performance. We believe this work opens new avenues for seq2seq alignment research, providing a solid foundation for further exploration and development within the community.
Updated: 2025-10-08 13:13:58
标题: 一种基于最优输运的序列到序列建模的可微对齐框架
摘要: 准确的序列到序列(seq2seq)对齐对于依赖自动语音识别(ASR)的医学语音分析和语言学习工具等应用至关重要。最先进的端到端(E2E)ASR系统,如连接主义时间分类(CTC)和基于传输器的模型,存在尖峰行为和对齐不准确的问题。在本文中,我们提出了一种基于一维最优传输的新型可微对齐框架,使模型能够学习单一对齐并以E2E方式执行ASR。我们引入了一种伪度量,称为序列最优传输距离(SOTD),在序列空间上讨论其理论特性。基于SOTD,我们提出了用于ASR的最优时空传输分类(OTTC)损失,并将其行为与CTC进行对比。在TIMIT、AMI和LibriSpeech数据集上的实验结果表明,与CTC和最近提出的一致性正则化CTC相比,我们的方法显著改善了对齐性能,尽管在ASR性能方面存在一定的权衡。我们相信这项工作为seq2seq对齐研究开辟了新的途径,为社区内进一步探索和发展提供了坚实的基础。
更新时间: 2025-10-08 13:13:58
领域: cs.LG,cs.SD,eess.AS,stat.ML
Spiral Model Technique For Data Science & Machine Learning Lifecycle
Analytics play an important role in modern business. Companies adapt data science lifecycles to their culture to seek productivity and improve their competitiveness among others. Data science lifecycles are fairly an important contributing factor to start and end a project that are data dependent. Data science and Machine learning life cycles comprises of series of steps that are involved in a project. A typical life cycle states that it is a linear or cyclical model that revolves around. It is mostly depicted that it is possible in a traditional data science life cycle to start the process again after reaching the end of cycle. This paper suggests a new technique to incorporate data science life cycle to business problems that have a clear end goal. A new technique called spiral technique is introduced to emphasize versatility, agility and iterative approach to business processes.
Updated: 2025-10-08 13:11:58
标题: 螺旋模型技术在数据科学与机器学习生命周期中的应用
摘要: 分析在现代商业中扮演着重要角色。公司根据其文化调整数据科学生命周期,以追求生产率并提高竞争力。数据科学生命周期是启动和结束依赖数据的项目的一个重要因素。数据科学和机器学习生命周期包括项目中涉及的一系列步骤。一个典型的生命周期陈述称,它是围绕着一个线性或循环模型。通常可以看到,在传统数据科学生命周期中,可以在到达周期结束后重新启动过程。本文提出了一种新技术,将数据科学生命周期整合到具有明确终极目标的业务问题中。引入了一种名为螺旋技术的新技术,以强调对业务流程的多样性、灵活性和迭代方法。
更新时间: 2025-10-08 13:11:58
领域: cs.LG,cs.IR,cs.SE
Revisiting Mixout: An Overlooked Path to Robust Finetuning
Finetuning vision foundation models often improves in-domain accuracy but comes at the cost of robustness under distribution shift. We revisit Mixout, a stochastic regularizer that intermittently replaces finetuned weights with their pretrained reference, through the lens of a single-run, weight-sharing implicit ensemble. This perspective reveals three key levers that govern robustness: the \emph{masking anchor}, \emph{resampling frequency}, and \emph{mask sparsity}. Guided by this analysis, we introduce GMixout, which (i) replaces the fixed anchor with an exponential moving-average snapshot that adapts during training, and (ii) regulates masking period via an explicit resampling-frequency hyperparameter. Our sparse-kernel implementation updates only a small fraction of parameters with no inference-time overhead, enabling training on consumer-grade GPUs. Experiments on benchmarks covering covariate shift, corruption, and class imbalance, ImageNet / ImageNet-LT, DomainNet, iWildCam, and CIFAR100-C, GMixout consistently improves in-domain accuracy beyond zero-shot performance while surpassing both Model Soups and strong parameter-efficient finetuning baselines under distribution shift.
Updated: 2025-10-08 13:07:50
标题: 重新审视Mixout:一种被忽视的强化微调路径
摘要: 微调视觉基础模型通常会提高领域内准确性,但会以在分布转移下的鲁棒性为代价。我们重新审视Mixout,这是一种随机正则化器,通过单次运行、权重共享的隐式集成视角,间歇性地用预训练的参考权重替换微调权重。这个视角揭示了三个关键杠杆,它们控制鲁棒性:\emph{掩码锚}、\emph{重采样频率}和\emph{掩码稀疏性}。在这一分析的指导下,我们引入了GMixout,它(i)用指数移动平均快照替换固定锚,适应训练过程中,并且(ii)通过显式重采样频率超参数调节掩码周期。我们的稀疏核实现只更新了一小部分参数,没有推理时间开销,可以在消费级GPU上进行训练。在覆盖协变量转移、损坏和类别不平衡的基准测试上,如ImageNet / ImageNet-LT,DomainNet,iWildCam和CIFAR100-C,GMixout始终提高了领域内准确性,超越了零-shot性能,同时在分布转移下超过了模型混合和强参数高效微调基线。
更新时间: 2025-10-08 13:07:50
领域: cs.LG,cs.CV
Relational Database Distillation: From Structured Tables to Condensed Graph Data
Relational databases (RDBs) underpin the majority of global data management systems, where information is structured into multiple interdependent tables. To effectively use the knowledge within RDBs for predictive tasks, recent advances leverage graph representation learning to capture complex inter-table relations as multi-hop dependencies. Despite achieving state-of-the-art performance, these methods remain hindered by the prohibitive storage overhead and excessive training time, due to the massive scale of the database and the computational burden of intensive message passing across interconnected tables. To alleviate these concerns, we propose and study the problem of Relational Database Distillation (RDD). Specifically, we aim to distill large-scale RDBs into compact heterogeneous graphs while retaining the predictive power (i.e., utility) required for training graph-based models. Multi-modal column information is preserved through node features, and primary-foreign key relations are encoded via heterogeneous edges, thereby maintaining both data fidelity and relational structure. To ensure adaptability across diverse downstream tasks without engaging the traditional, inefficient bi-level distillation framework, we further design a kernel ridge regression-guided objective with pseudo-labels, which produces quality features for the distilled graph. Extensive experiments on multiple real-world RDBs demonstrate that our solution substantially reduces the data size while maintaining competitive performance on classification and regression tasks, creating an effective pathway for scalable learning with RDBs.
Updated: 2025-10-08 13:05:31
标题: 关系数据库精炼:从结构化表到压缩图数据
摘要: 关系数据库(RDBs)支撑着全球大多数数据管理系统,其中信息被结构化成多个相互依赖的表格。为了有效地利用RDBs中的知识进行预测任务,最近的进展利用图表示学习来捕捉复杂的跨表关系作为多跳依赖。尽管取得了最先进的性能,但这些方法仍然受到存储开销和过多的训练时间的限制,这是由于数据库的大规模和跨连接表之间的密集消息传递的计算负担所致。为了缓解这些问题,我们提出并研究了关系数据库蒸馏(RDD)的问题。具体而言,我们旨在将大规模的RDBs蒸馏成紧凑的异构图,同时保留训练基于图的模型所需的预测能力(即效用)。通过节点特征保留多模式列信息,并通过异构边编码主外键关系,从而保持数据的忠实性和关系结构。为了确保在不涉及传统低效的双层蒸馏框架的情况下,能够适应各种不同的下游任务,我们进一步设计了一个由核岭回归引导的目标,配合伪标签,为蒸馏图生成高质量特征。对多个真实世界RDBs的广泛实验表明,我们的解决方案显著减小了数据大小,同时在分类和回归任务上保持了竞争性能,为与RDBs的可扩展学习提供了有效路径。
更新时间: 2025-10-08 13:05:31
领域: cs.DB,cs.LG
VelLMes: A high-interaction AI-based deception framework
There are very few SotA deception systems based on Large Language Models. The existing ones are limited only to simulating one type of service, mainly SSH shells. These systems - but also the deception technologies not based on LLMs - lack an extensive evaluation that includes human attackers. Generative AI has recently become a valuable asset for cybersecurity researchers and practitioners, and the field of cyber-deception is no exception. Researchers have demonstrated how LLMs can be leveraged to create realistic-looking honeytokens, fake users, and even simulated systems that can be used as honeypots. This paper presents an AI-based deception framework called VelLMes, which can simulate multiple protocols and services such as SSH Linux shell, MySQL, POP3, and HTTP. All of these can be deployed and used as honeypots, thus VelLMes offers a variety of choices for deception design based on the users' needs. VelLMes is designed to be attacked by humans, so interactivity and realism are key for its performance. We evaluate the generative capabilities and the deception capabilities. Generative capabilities were evaluated using unit tests for LLMs. The results of the unit tests show that, with careful prompting, LLMs can produce realistic-looking responses, with some LLMs having a 100% passing rate. In the case of the SSH Linux shell, we evaluated deception capabilities with 89 human attackers. The results showed that about 30% of the attackers thought that they were interacting with a real system when they were assigned an LLM-based honeypot. Lastly, we deployed 10 instances of the SSH Linux shell honeypot on the Internet to capture real-life attacks. Analysis of these attacks showed us that LLM honeypots simulating Linux shells can perform well against unstructured and unexpected attacks on the Internet, responding correctly to most of the issued commands.
Updated: 2025-10-08 13:00:23
标题: VelLMes:一个基于高交互性人工智能的欺骗框架
摘要: 目前基于大型语言模型的欺骗系统非常少见。现有的系统仅限于模拟一种类型的服务,主要是SSH shells。这些系统 - 以及不基于LLMs的欺骗技术 - 缺乏包括人类攻击者在内的广泛评估。生成式人工智能最近已成为网络安全研究人员和从业者的宝贵资产,网络欺骗领域也不例外。研究人员已经展示了如何利用LLMs来创建逼真的蜜罐标记、虚假用户,甚至可以用作蜜罐的模拟系统。本文介绍了一种名为VelLMes的基于人工智能的欺骗框架,可以模拟多个协议和服务,如SSH Linux shell、MySQL、POP3和HTTP。所有这些都可以部署并用作蜜罐,因此VelLMes根据用户的需求提供了多种欺骗设计选择。VelLMes被设计为可以受到人类攻击的系统,因此交互性和真实性对其性能至关重要。我们评估了生成能力和欺骗能力。生成能力是通过LLMs的单元测试进行评估的。单元测试的结果显示,在仔细提示下,LLMs可以产生逼真的响应,其中一些LLMs的通过率达到100%。在SSH Linux shell的情况下,我们通过89个人类攻击者评估了欺骗能力。结果显示,约30%的攻击者认为他们与一个真实系统进行交互,当他们被分配一个基于LLM的蜜罐时。最后,我们在互联网上部署了10个实例的SSH Linux shell蜜罐,以捕获真实生活中的攻击。对这些攻击的分析显示,模拟Linux shells的LLM蜜罐可以很好地对抗互联网上的无序和意外攻击,并对大多数发出的命令作出正确响应。
更新时间: 2025-10-08 13:00:23
领域: cs.CR,cs.AI,cs.CL
Dynamic Learning Rate for Deep Reinforcement Learning: A Bandit Approach
In deep Reinforcement Learning (RL), the learning rate critically influences both stability and performance, yet its optimal value shifts during training as the environment and policy evolve. Standard decay schedulers assume monotonic convergence and often misalign with these dynamics, leading to premature or delayed adjustments. We introduce LRRL, a meta-learning approach that dynamically selects the learning rate based on policy performance rather than training steps. LRRL adaptively favors rates that improve returns, remaining robust even when the candidate set includes values that individually cause divergence. Across Atari and MuJoCo benchmarks, LRRL achieves performance competitive with or superior to tuned baselines and standard schedulers. Our findings position LRRL as a practical solution for adapting to non-stationary objectives in deep RL.
Updated: 2025-10-08 12:58:01
标题: 深度强化学习的动态学习率:一种赌博机方法
摘要: 在深度强化学习(RL)中,学习率对稳定性和性能都有重大影响,然而其最优值在训练过程中会随着环境和策略的演变而发生变化。标准的衰减调度器假定单调收敛,并且通常与这些动态不一致,导致过早或延迟的调整。我们引入了LRRL,这是一种基于策略表现而非训练步骤动态选择学习率的元学习方法。LRRL自适应地偏好能够提高回报的学习率,即使候选集包含导致发散的值,仍然保持稳健性。在Atari和MuJoCo基准测试中,LRRL的性能与或优于调整过的基线和标准调度器。我们的研究结果将LRRL定位为适应深度RL中非稳态目标的实际解决方案。
更新时间: 2025-10-08 12:58:01
领域: cs.LG
Falsification-Driven Reinforcement Learning for Maritime Motion Planning
Compliance with maritime traffic rules is essential for the safe operation of autonomous vessels, yet training reinforcement learning (RL) agents to adhere to them is challenging. The behavior of RL agents is shaped by the training scenarios they encounter, but creating scenarios that capture the complexity of maritime navigation is non-trivial, and real-world data alone is insufficient. To address this, we propose a falsification-driven RL approach that generates adversarial training scenarios in which the vessel under test violates maritime traffic rules, which are expressed as signal temporal logic specifications. Our experiments on open-sea navigation with two vessels demonstrate that the proposed approach provides more relevant training scenarios and achieves more consistent rule compliance.
Updated: 2025-10-08 12:56:31
标题: 基于伪造驱动的海洋运动规划强化学习
摘要: 遵守海上交通规则对于自主船舶的安全运行至关重要,然而训练强化学习(RL)代理程序遵守这些规则是具有挑战性的。RL代理程序的行为受其遇到的训练场景的影响,但创建能够捕捉海上航行复杂性的场景并非易事,仅凭真实世界数据是不足够的。为了解决这个问题,我们提出了一个基于反证的RL方法,生成对抗性训练场景,在这些场景中测试船只违反海上交通规则,这些规则以信号时间逻辑规范表达。我们在两艘船的开放海域导航实验中表明,所提出的方法提供了更相关的训练场景,并实现了更一致的规则遵守。
更新时间: 2025-10-08 12:56:31
领域: eess.SY,cs.LG,cs.SY
Möbius transforms and Shapley values for vector-valued functions on weighted directed acyclic multigraphs
We generalize the concept of M\"obius inversion and Shapley values to directed acyclic multigraphs and weighted versions thereof. We further allow value functions (games) and thus their M\"obius transforms (synergy function) and Shapley values to have values in any abelian group that is a module over a ring that contains the graph weights, e.g. vector-valued functions. To achieve this and overcome the obstruction that the classical axioms (linearity, efficiency, null player, symmetry) are not strong enough to uniquely determine Shapley values in this more general setting, we analyze Shapley values from two novel points of view: 1) We introduce projection operators that allow us to interpret Shapley values as the recursive projection and re-attribution of higher-order synergies to lower-order ones; 2) we propose a strengthening of the null player axiom and a localized symmetry axiom, namely the weak elements and flat hierarchy axioms. The former allows us to remove coalitions with vanishing synergy while preserving the rest of the hierarchical structure. The latter treats player-coalition bonds uniformly in the corner case of hierarchically flat graphs. Together with linearity these axioms already imply a unique explicit formula for the Shapley values, as well as classical properties like efficiency, null player, symmetry, and novel ones like the projection property. This whole framework then specializes to finite inclusion algebras, lattices, partial orders and mereologies, and also recovers certain previously known cases as corner cases, and presents others from a new perspective. The admission of general weighted directed acyclic multigraph structured hierarchies and vector-valued functions and Shapley values opens up the possibility for new analytic tools and application areas, like machine learning, language processing, explainable artificial intelligence, and many more.
Updated: 2025-10-08 12:55:31
标题: 莫比乌斯变换和沙普利值在加权有向无环多图上的矢量值函数中的应用
摘要: 我们将M\"obius反演和Shapley值的概念推广到有向无环多重图及其加权版本。我们进一步允许价值函数(游戏)以及它们的M\"obius变换(协同函数)和Shapley值在任何阿贝尔群中取值,该群是包含图权重的环的模。为了实现这一点并克服经典公理(线性、效率、空玩家、对称性)在这种更一般的情况下无法唯一确定Shapley值的障碍,我们从两个新颖的角度分析Shapley值:1)我们引入投影算子,使我们能够将Shapley值解释为将高阶协同递归投影和重新归因给低阶协同;2)我们提出了空玩家公理的加强和局部对称性公理,即弱元素和平坦层次结构公理。前者允许我们去除具有消失协同的联盟,同时保留层次结构的其余部分。后者在层次平坦图的角落案例中统一处理玩家-联盟的联系。与线性一起,这些公理已经暗示了Shapley值的唯一显式公式,以及效率、空玩家、对称性等经典性质,以及投影性质等新颖性质。整个框架随后特化为有限包含代数、格、偏序和部分整体论,并且从新的视角呈现某些以前已知的情况作为极端情况,并展示了其他情况。允许一般加权有向无环多重图结构层次和向量值函数以及Shapley值的存在,为新的分析工具和应用领域打开了可能性,如机器学习、语言处理、可解释人工智能等许多领域。
更新时间: 2025-10-08 12:55:31
领域: cs.GT,cs.DM,cs.LG,math.CO,91A12 (Primary) 06A07, 05E99 (Secondary),F.2.2; G.2.1; G.2.2; I.2.0
Contrastive Graph Condensation: Advancing Data Versatility through Self-Supervised Learning
With the increasing computation of training graph neural networks (GNNs) on large-scale graphs, graph condensation (GC) has emerged as a promising solution to synthesize a compact, substitute graph of the large-scale original graph for efficient GNN training. However, existing GC methods predominantly employ classification as the surrogate task for optimization, thus excessively relying on node labels and constraining their utility in label-sparsity scenarios. More critically, this surrogate task tends to overfit class-specific information within the condensed graph, consequently restricting the generalization capabilities of GC for other downstream tasks. To address these challenges, we introduce Contrastive Graph Condensation (CTGC), which adopts a self-supervised surrogate task to extract critical, causal information from the original graph and enhance the cross-task generalizability of the condensed graph. Specifically, CTGC employs a dual-branch framework to disentangle the generation of the node attributes and graph structures, where a dedicated structural branch is designed to explicitly encode geometric information through nodes' positional embeddings. By implementing an alternating optimization scheme with contrastive loss terms, CTGC promotes the mutual enhancement of both branches and facilitates high-quality graph generation through the model inversion technique. Extensive experiments demonstrate that CTGC excels in handling various downstream tasks with a limited number of labels, consistently outperforming state-of-the-art GC methods.
Updated: 2025-10-08 12:49:19
标题: 对比图压缩:通过自监督学习推进数据多样性
摘要: 随着在大规模图上训练图神经网络(GNNs)的计算量增加,图压缩(GC)已经成为一种有前途的解决方案,用于合成大规模原始图的紧凑、替代图,以便高效地训练GNN。然而,现有的GC方法主要采用分类作为优化的代理任务,因此过度依赖节点标签,并限制它们在标签稀疏场景中的效用。更为关键的是,这种代理任务往往会在压缩图中过度拟合特定类别的信息,从而限制了GC在其他下游任务中的泛化能力。为了解决这些挑战,我们引入了对比图压缩(CTGC),它采用自监督代理任务从原始图中提取关键的因果信息,并增强压缩图的跨任务泛化能力。具体而言,CTGC采用双分支框架来解开节点属性和图结构的生成,其中设计了一个专门的结构分支来通过节点的位置嵌入明确编码几何信息。通过实现具有对比损失项的交替优化方案,CTGC促进了两个分支的相互增强,并通过模型反演技术促进了高质量图的生成。大量实验证明,CTGC在处理有限数量标签的各种下游任务方面表现出色,始终优于最先进的GC方法。
更新时间: 2025-10-08 12:49:19
领域: cs.LG
Accelerating Sparse Ternary GEMM for Quantized LLM inference on Apple Silicon
Sparse Ternary General Matrix-Matrix Multiplication (GEMM) remains under-optimized in existing libraries for Apple Silicon CPUs. We present a Sparse Ternary GEMM kernel optimized specifically for Apple's M-series processors. We propose a set of architecture-aware optimizations, including a novel blocked and interleaved sparse data format to improve memory locality, strategies to increase Instruction-Level Parallelism (ILP), and NEON-based Single Instruction Multiple Data (SIMD) vectorization to exploit data-level parallelism. Our scalar implementation achieves up to a 5.98x performance increase over a traditional Ternary Compressed Sparse Column (TCSC) baseline for large matrices with 50% ternary nonzero values (sparsity), reaching up to a 50.2% of the processor's theoretical peak performance, and remains stable across varying sparsity levels. Our vectorized implementation delivers up to a 5.59x performance increase for large matrices with 25% sparsity, and remains stable across varying sparsity levels.
Updated: 2025-10-08 12:42:07
标题: 加速稀疏三值GEMM,用于在苹果硅上进行量化LLM推断
摘要: 稀疏三值通用矩阵-矩阵乘法(GEMM)在现有的苹果硅芯CPU库中仍然未经优化。我们提出了一种专门针对苹果M系列处理器优化的稀疏三值GEMM核。我们提出了一组针对架构的优化,包括一种新颖的分块和交错稀疏数据格式,以改善内存局部性,增加指令级并行性(ILP)的策略,以及基于NEON的单指令多数据(SIMD)向量化以利用数据级并行性。我们的标量实现在具有50%三值非零值(稀疏性)的大矩阵上,相比传统的三值压缩稀疏列(TCSC)基准实现,性能提高了多达5.98倍,达到处理器理论峰值性能的50.2%,并在不同稀疏水平下保持稳定。我们的矢量化实现在具有25%稀疏性的大矩阵上,性能提高多达5.59倍,并在不同稀疏水平下保持稳定。
更新时间: 2025-10-08 12:42:07
领域: cs.PF,cs.LG
Domain Generalization by Rejecting Extreme Augmentations
Data augmentation is one of the most effective techniques for regularizing deep learning models and improving their recognition performance in a variety of tasks and domains. However, this holds for standard in-domain settings, in which the training and test data follow the same distribution. For the out-of-domain case, where the test data follow a different and unknown distribution, the best recipe for data augmentation is unclear. In this paper, we show that for out-of-domain and domain generalization settings, data augmentation can provide a conspicuous and robust improvement in performance. To do that, we propose a simple training procedure: (i) use uniform sampling on standard data augmentation transformations; (ii) increase the strength transformations to account for the higher data variance expected when working out-of-domain, and (iii) devise a new reward function to reject extreme transformations that can harm the training. With this procedure, our data augmentation scheme achieves a level of accuracy that is comparable to or better than state-of-the-art methods on benchmark domain generalization datasets. Code: https://github.com/Masseeh/DCAug
Updated: 2025-10-08 12:39:46
标题: 通过拒绝极端增广来进行域泛化
摘要: 数据增强是正规深度学习模型和改善它们在各种任务和领域中的识别性能的最有效技术之一。然而,这仅适用于标准领域内的情况,其中训练和测试数据遵循相同的分布。对于领域外的情况,其中测试数据遵循不同且未知的分布,数据增强的最佳方法尚不清楚。在本文中,我们展示了对于领域外和领域泛化设置,数据增强可以在性能上提供显著且稳健的改善。为此,我们提出了一个简单的训练过程:(i)在标准数据增强转换上使用均匀抽样;(ii)增加转换的强度,以考虑在领域外工作时所期望的更高的数据方差;(iii)设计一个新的奖励函数,以拒绝可能损害训练的极端转换。通过这个过程,我们的数据增强方案在基准领域泛化数据集上实现了与最先进方法相媲美或更好的精度水平。 代码:https://github.com/Masseeh/DCAug
更新时间: 2025-10-08 12:39:46
领域: cs.LG,cs.CV
High-Rate Mixout: Revisiting Mixout for Robust Domain Generalization
Ensembling fine-tuned models initialized from powerful pre-trained weights is a common strategy to improve robustness under distribution shifts, but it comes with substantial computational costs due to the need to train and store multiple models. Dropout offers a lightweight alternative by simulating ensembles through random neuron deactivation; however, when applied to pre-trained models, it tends to over-regularize and disrupt critical representations necessary for generalization. In this work, we investigate Mixout, a stochastic regularization technique that provides an alternative to Dropout for domain generalization. Rather than deactivating neurons, Mixout mitigates overfitting by probabilistically swapping a subset of fine-tuned weights with their pre-trained counterparts during training, thereby maintaining a balance between adaptation and retention of prior knowledge. Our study reveals that achieving strong performance with Mixout on domain generalization benchmarks requires a notably high masking probability of 0.9 for ViTs and 0.8 for ResNets. While this may seem like a simple adjustment, it yields two key advantages for domain generalization: (1) higher masking rates more strongly penalize deviations from the pre-trained parameters, promoting better generalization to unseen domains; and (2) high-rate masking substantially reduces computational overhead, cutting gradient computation by up to 45% and gradient memory usage by up to 90%. Experiments across five domain generalization benchmarks, PACS, VLCS, OfficeHome, TerraIncognita, and DomainNet, using ResNet and ViT architectures, show that our approach, High-rate Mixout, achieves out-of-domain accuracy comparable to ensemble-based methods while significantly reducing training costs.
Updated: 2025-10-08 12:37:56
标题: 高速混合:再次审视混合以实现稳健的领域泛化
摘要: 将细调的模型集成化,该模型是从强大的预训练权重初始化的,是一种常见的策略,用于提高在分布变化下的稳健性,但由于需要训练和存储多个模型,因此会带来相当大的计算成本。Dropout通过随机神经元去激活来模拟集成,提供了一种轻量级的替代方案;然而,当应用于预训练模型时,它往往会过度正则化并破坏对泛化必要的关键表示。在这项工作中,我们研究了Mixout,这是一种提供领域泛化替代方案的随机正则化技术。与去激活神经元不同,Mixout通过在训练过程中以概率方式交换一部分细调权重与它们的预训练对应物,从而在适应性和保留先前知识之间保持平衡,以减少过拟合。我们的研究表明,在ViTs和ResNets中,通过Mixout实现强大的领域泛化性能需要显著高的掩蔽概率,分别为0.9和0.8。尽管这似乎是一个简单的调整,但它为领域泛化带来了两个关键优势:高掩蔽率更加严厉地惩罚偏离预训练参数,促进更好地泛化到未见领域;高率掩蔽大大减少了计算开销,将梯度计算减少了高达45%,梯度内存使用减少了高达90%。在ResNet和ViT架构上的五个领域泛化基准测试中,PACS、VLCS、OfficeHome、TerraIncognita和DomainNet,我们的High-rate Mixout方法显示出,与基于集成的方法相比,可以实现与领域泛化相关的准确性,并显著降低训练成本。
更新时间: 2025-10-08 12:37:56
领域: cs.LG,cs.CV
From Condensation to Rank Collapse: A Two-Stage Analysis of Transformer Training Dynamics
Although transformer-based models have shown exceptional empirical performance, the fundamental principles governing their training dynamics are inadequately characterized beyond configuration-specific studies. Inspired by empirical evidence showing improved reasoning capabilities under small initialization scales in language models, we employ the gradient flow analytical framework established in [Zhou et al. NeurIPS 2022] to systematically investigate linearized Transformer training dynamics. Our theoretical analysis dissects the dynamics of attention modules into two distinct stages. In the first stage, asymmetric weight perturbations from random initialization sustain non-degenerate gradient dynamics in parameter matrices, facilitating systematic escape from small initialization regimes. Subsequently, these matrices undergo condensation, progressively aligning toward the target orientation. In the second stage, the previously static key-query matrices actively participate in training, driving the normalized matrices toward asymptotic rank collapse. This two-stage framework generalizes classical directional convergence results.
Updated: 2025-10-08 12:37:53
标题: 从凝聚到秩崩溃:Transformer训练动态的两阶段分析
摘要: 尽管基于Transformer的模型在实证性能方面表现出色,但其训练动态的基本原则在超出特定配置研究的范围内尚未得到充分表征。受到实证证据表明语言模型在小初始化尺度下具有改进的推理能力的启发,我们采用[Zhou等人NeurIPS 2022]建立的梯度流分析框架系统地研究线性化Transformer训练动态。我们的理论分析将注意力模块的动态解剖为两个不同阶段。在第一阶段,来自随机初始化的不对称权重扰动维持参数矩阵中的非退化梯度动态,有助于系统性地逃离小初始化范围。随后,这些矩阵经历凝聚,逐渐朝向目标方向对齐。在第二阶段,先前静态的键-查询矩阵积极参与训练,驱动归一化矩阵朝向渐近秩坍缩。这个两阶段框架概括了经典的方向收敛结果。
更新时间: 2025-10-08 12:37:53
领域: cs.LG
I Can't Patch My OT Systems! A Look at CISA's KEVC Workarounds & Mitigations for OT
We examine the state of publicly available information about known exploitable vulnerabilities applicable to operational technology (OT) environments. Specifically, we analyze the Known Exploitable Vulnerabilities Catalog (KEVC) maintained by the US Department of Homeland Security Cybersecurity and Infrastructure Security Agency (CISA) to assess whether currently available data is sufficient for effective and reliable remediation in OT settings. Our team analyzed all KEVC entries through July 2025 to determine the extent to which OT environments can rely on existing remediation recommendations. We found that although most entries in the KEVC could affect OT environments, only 13% include vendor workarounds or mitigations as alternatives to patching. This paper also examines the feasibility of developing such alternatives based on vulnerability and exploit characteristics, and we present early evidence of success with this approach.
Updated: 2025-10-08 12:34:59
标题: 我无法为我的OT系统打补丁!审视CISA的KEVC OT系统的解决方法和缓解措施
摘要: 我们研究了关于已知可利用漏洞的公开信息在操作技术(OT)环境中的适用性。具体来说,我们分析了由美国国土安全部网络安全和基础设施安全局(CISA)维护的已知可利用漏洞目录(KEVC),以评估当前可用数据是否足以在OT环境中进行有效和可靠的修复。我们的团队分析了截至2025年7月的所有KEVC条目,以确定OT环境能否依赖现有的修复建议。我们发现,尽管KEVC中的大多数条目都可能影响OT环境,但只有13%包括供应商的解决方法或缓解措施作为修补的替代方案。本文还考察了基于漏洞和利用特征开发这类替代方案的可行性,并提供了此方法取得初步成功的证据。
更新时间: 2025-10-08 12:34:59
领域: cs.CR
Grouped Differential Attention
The self-attention mechanism, while foundational to modern Transformer architectures, suffers from a critical inefficiency: it frequently allocates substantial attention to redundant or noisy context. Differential Attention addressed this by using subtractive attention maps for signal and noise, but its required balanced head allocation imposes rigid constraints on representational flexibility and scalability. To overcome this, we propose Grouped Differential Attention (GDA), a novel approach that introduces unbalanced head allocation between signal-preserving and noise-control groups. GDA significantly enhances signal focus by strategically assigning more heads to signal extraction and fewer to noise-control, stabilizing the latter through controlled repetition (akin to GQA). This design achieves stronger signal fidelity with minimal computational overhead. We further extend this principle to group-differentiated growth, a scalable strategy that selectively replicates only the signal-focused heads, thereby ensuring efficient capacity expansion. Through large-scale pretraining and continual training experiments, we demonstrate that moderate imbalance ratios in GDA yield substantial improvements in generalization and stability compared to symmetric baselines. Our results collectively establish that ratio-aware head allocation and selective expansion offer an effective and practical path toward designing scalable, computation-efficient Transformer architectures.
Updated: 2025-10-08 12:32:28
标题: 分组差异性注意力
摘要: 自我注意机制是现代Transformer架构的基础,但存在一个关键的低效问题:它经常将大量关注力分配给冗余或嘈杂的上下文。差分注意力通过使用用于信号和噪声的减法注意力图解决了这个问题,但其需要平衡头部分配对表示灵活性和可扩展性施加了严格的约束。 为了克服这一问题,我们提出了分组差分注意力(GDA),这是一种新颖的方法,它在信号保留和噪声控制组之间引入了不平衡的头部分配。GDA通过将更多的头部分配给信号提取,较少头部分配给噪声控制,通过受控重复(类似于GQA)稳定后者,从而显著增强了信号聚焦。这种设计实现了更强的信号保真度,同时具有最小的计算开销。我们进一步将这一原则扩展到分组差异化增长,这是一种可扩展的策略,选择性复制只针对信号聚焦的头部,从而确保有效的容量扩展。 通过大规模预训练和持续训练实验,我们证明了GDA中适度不平衡比率相对于对称基线具有显著的泛化和稳定性改进。我们的结果共同确立了比率感知头部分配和选择性扩展为设计可扩展、计算高效的Transformer架构提供了一条有效且实际的路径。
更新时间: 2025-10-08 12:32:28
领域: cs.LG,cs.AI
Improving Neutral Point-of-View Generation with Data- and Parameter-Efficient RL
The paper shows that parameter-efficient reinforcement learning (PE-RL) is a highly effective training regime to improve large language models' (LLMs) ability to answer queries on sensitive topics with a Neutral Point of View (NPOV), i.e. to provide significantly more informative, diverse and impartial answers. This is shown by evaluating PE-RL and multiple strong baselines-including LoRA finetuning (strongest baseline), SFT and RLHF. PE-RL not only improves on overall NPOV quality compared to the strongest baseline ($97.06\%\rightarrow 99.08\%$), but also scores much higher on features linguists identify as key to separating sufficient answers from "great'' answers ($60.25\%\rightarrow 85.21\%$ for presence of supportive details, $68.74\%\rightarrow 91.43\%$ for absence of oversimplification). A qualitative analysis corroborates this. Moreover, our evaluation also finds a key property of PE-RL for this task: unlike methods that update all parameters, it generalises out of topic. Finally, to enable further studies we also release the dataset, SHQ-NPOV, and provide a methodology to create such datasets through iterative rounds of human peer-critique and annotator training.
Updated: 2025-10-08 12:30:55
标题: 使用数据和参数高效的强化学习改进中立观点生成
摘要: 这篇论文表明,参数高效的强化学习(PE-RL)是一种非常有效的训练方法,可以提高大型语言模型(LLMs)在处理敏感话题时以中立观点(NPOV)回答查询的能力,即提供更加信息丰富、多样化和公正的答案。通过评估PE-RL和多个强基线(包括LoRA微调(最强基线)、SFT和RLHF),证明了这一点。PE-RL不仅在整体NPOV质量上比最强基线有所改善($97.06\%\rightarrow 99.08\%$),而且在语言学家认为区分足够答案和“优秀”答案的关键特征上得分更高(支持性细节存在性从$60.25\%\rightarrow 85.21\%$,简化缺失率从$68.74\%\rightarrow 91.43\%$)。定性分析证实了这一点。此外,我们的评估还发现PE-RL在这一任务中的一个关键特性:与更新所有参数的方法不同,它可以泛化出主题。最后,为了促进进一步的研究,我们还发布了数据集SHQ-NPOV,并提供了一个通过人类同行批评和注释者培训的迭代轮次创建这种数据集的方法论。
更新时间: 2025-10-08 12:30:55
领域: cs.CL,cs.AI,cs.LG
Fisher Information, Training and Bias in Fourier Regression Models
Motivated by the growing interest in quantum machine learning, in particular quantum neural networks (QNNs), we study how recently introduced evaluation metrics based on the Fisher information matrix (FIM) are effective for predicting their training and prediction performance. We exploit the equivalence between a broad class of QNNs and Fourier models, and study the interplay between the \emph{effective dimension} and the \emph{bias} of a model towards a given task, investigating how these affect the model's training and performance. We show that for a model that is completely agnostic, or unbiased, towards the function to be learned, a higher effective dimension likely results in a better trainability and performance. On the other hand, for models that are biased towards the function to be learned a lower effective dimension is likely beneficial during training. To obtain these results, we derive an analytical expression of the FIM for Fourier models and identify the features controlling a model's effective dimension. This allows us to construct models with tunable effective dimension and bias, and to compare their training. We furthermore introduce a tensor network representation of the considered Fourier models, which could be a tool of independent interest for the analysis of QNN models. Overall, these findings provide an explicit example of the interplay between geometrical properties, model-task alignment and training, which are relevant for the broader machine learning community.
Updated: 2025-10-08 12:29:11
标题: 费舍尔信息、训练和傅立叶回归模型中的偏差
摘要: 受到对量子机器学习,特别是量子神经网络(QNNs)日益增长的兴趣的启发,我们研究了基于费舍尔信息矩阵(FIM)的最近引入的评估指标如何有效地预测它们的训练和预测性能。我们利用了广泛类别的QNNs和傅立叶模型之间的等价性,并研究了模型的“有效维度”和“偏差”对特定任务的影响,探讨了这些如何影响模型的训练和性能。我们发现,对于完全不可知或无偏向要学习的函数的模型,更高的有效维度可能会导致更好的可训练性和性能。另一方面,对于偏向要学习函数的模型,在训练过程中较低的有效维度可能有益。为了获得这些结果,我们推导了傅立叶模型的FIM的解析表达式,并确定了控制模型有效维度的特征。这使我们能够构建具有可调有效维度和偏差的模型,并比较它们的训练。此外,我们还介绍了所考虑的傅立叶模型的张量网络表示,这可能是用于分析QNN模型的独立感兴趣的工具。总的来说,这些发现提供了几何性质、模型-任务对齐和训练之间相互作用的明确示例,这对更广泛的机器学习社区是相关的。
更新时间: 2025-10-08 12:29:11
领域: cs.LG,cond-mat.dis-nn,physics.data-an,quant-ph
Revisiting Node Affinity Prediction in Temporal Graphs
Node affinity prediction is a common task that is widely used in temporal graph learning with applications in social and financial networks, recommender systems, and more. Recent works have addressed this task by adapting state-of-the-art dynamic link property prediction models to node affinity prediction. However, simple heuristics, such as Persistent Forecast or Moving Average, outperform these models. In this work, we analyze the challenges in training current Temporal Graph Neural Networks for node affinity prediction and suggest appropriate solutions. Combining the solutions, we develop NAViS - Node Affinity prediction model using Virtual State, by exploiting the equivalence between heuristics and state space models. While promising, training NAViS is non-trivial. Therefore, we further introduce a novel loss function for node affinity prediction. We evaluate NAViS on TGB and show that it outperforms the state-of-the-art, including heuristics. Our source code is available at https://github.com/orfeld415/NAVIS
Updated: 2025-10-08 12:21:52
标题: 重新审视时间图中的节点亲和性预测
摘要: 节点亲和性预测是一项常见任务,在社交和金融网络、推荐系统等领域中广泛使用。最近的研究通过将最先进的动态链接属性预测模型调整为节点亲和性预测模型来解决这一任务。然而,简单的启发式方法,如持续预测或移动平均,胜过这些模型。在本文中,我们分析了当前时间图神经网络在节点亲和性预测方面训练中的挑战,并提出了适当的解决方案。结合这些解决方案,我们开发了一个使用虚拟状态的节点亲和性预测模型NAViS,通过利用启发式和状态空间模型之间的等价性。尽管有希望,训练NAViS并不容易。因此,我们进一步引入了一种新颖的损失函数用于节点亲和性预测。我们在TGB上评估了NAViS,并展示其优于最先进技术,包括启发式方法。我们的源代码可在https://github.com/orfeld415/NAVIS上获得。
更新时间: 2025-10-08 12:21:52
领域: cs.LG
PyCFRL: A Python library for counterfactually fair offline reinforcement learning via sequential data preprocessing
Reinforcement learning (RL) aims to learn and evaluate a sequential decision rule, often referred to as a "policy", that maximizes the population-level benefit in an environment across possibly infinitely many time steps. However, the sequential decisions made by an RL algorithm, while optimized to maximize overall population benefits, may disadvantage certain individuals who are in minority or socioeconomically disadvantaged groups. To address this problem, we introduce PyCFRL, a Python library for ensuring counterfactual fairness in offline RL. PyCFRL implements a novel data preprocessing algorithm for learning counterfactually fair RL policies from offline datasets and provides tools to evaluate the values and counterfactual unfairness levels of RL policies. We describe the high-level functionalities of PyCFRL and demonstrate one of its major use cases through a data example. The library is publicly available on PyPI and Github (https://github.com/JianhanZhang/PyCFRL), and detailed tutorials can be found in the PyCFRL documentation (https://pycfrl-documentation.netlify.app).
Updated: 2025-10-08 12:19:10
标题: PyCFRL: 通过序列数据预处理实现反事实公平的Python库
摘要: 强化学习(RL)旨在学习和评估一个顺序决策规则,通常被称为“策略”,该策略在可能无限多个时间步中最大化环境中的人群级别利益。然而,RL算法所做的顺序决策,虽然优化以最大化整体人群利益,但可能会对处于少数群体或社会经济弱势群体的个体造成不利影响。为解决这一问题,我们引入了PyCFRL,这是一个用于确保离线RL中反事实公平性的Python库。PyCFRL实现了一种新颖的数据预处理算法,用于从离线数据集中学习反事实公平的RL策略,并提供工具来评估RL策略的价值和反事实不公平水平。我们描述了PyCFRL的高级功能,并通过一个数据示例展示了其主要用例之一。该库可以在PyPI和Github上公开获取(https://github.com/JianhanZhang/PyCFRL),PyCFRL文档中可以找到详细的教程(https://pycfrl-documentation.netlify.app)。
更新时间: 2025-10-08 12:19:10
领域: stat.ML,cs.LG
Textual interpretation of transient image classifications from large language models
Modern astronomical surveys deliver immense volumes of transient detections, yet distinguishing real astrophysical signals (for example, explosive events) from bogus imaging artefacts remains a challenge. Convolutional neural networks are effectively used for real versus bogus classification; however, their reliance on opaque latent representations hinders interpretability. Here we show that large language models (LLMs) can approach the performance level of a convolutional neural network on three optical transient survey datasets (Pan-STARRS, MeerLICHT and ATLAS) while simultaneously producing direct, human-readable descriptions for every candidate. Using only 15 examples and concise instructions, Google's LLM, Gemini, achieves a 93% average accuracy across datasets that span a range of resolution and pixel scales. We also show that a second LLM can assess the coherence of the output of the first model, enabling iterative refinement by identifying problematic cases. This framework allows users to define the desired classification behaviour through natural language and examples, bypassing traditional training pipelines. Furthermore, by generating textual descriptions of observed features, LLMs enable users to query classifications as if navigating an annotated catalogue, rather than deciphering abstract latent spaces. As next-generation telescopes and surveys further increase the amount of data available, LLM-based classification could help bridge the gap between automated detection and transparent, human-level understanding.
Updated: 2025-10-08 12:12:46
标题: 大型语言模型对瞬时图像分类的文本解释
摘要: 现代天文勘测提供了大量的瞬变检测数据,然而区分真实的天体信号(例如爆炸事件)和假象成像伪迹仍然是一个挑战。卷积神经网络被有效地用于真实与伪造分类;然而,它们对不透明潜在表示的依赖阻碍了可解释性。在这里,我们展示了大型语言模型(LLMs)可以在三个光学瞬变勘测数据集(Pan-STARRS、MeerLICHT和ATLAS)上接近卷积神经网络的性能水平,同时为每个候选对象生成直接、可读的描述。使用仅15个示例和简洁的说明,谷歌的LLM Gemini 在跨分辨率和像素尺度范围的数据集上实现了93%的平均准确率。我们还展示了第二个LLM可以评估第一个模型的输出的连贯性,通过识别问题案例实现迭代精炼。这个框架允许用户通过自然语言和示例定义所需的分类行为,绕过传统的训练管道。此外,通过生成观测特征的文本描述,LLMs使用户能够查询分类,就像浏览注释目录,而不是解码抽象的潜在空间。随着下一代望远镜和勘测进一步增加可用数据的数量,基于LLM的分类可能有助于弥合自动检测和透明、人类水平理解之间的差距。
更新时间: 2025-10-08 12:12:46
领域: astro-ph.IM,cs.LG
Real-Time Progress Prediction in Reasoning Language Models
Recent advances in reasoning language models -- particularly those that use long, latent chains of thought -- have demonstrated remarkable capabilities in complex, agentic tasks. However, as these models operate over increasingly extended time horizons, their internal progress becomes opaque to users, complicating expectation management and real-time oversight. In this work, we investigate whether real-time progress prediction is feasible. We discretize progress and train a linear probe to classify reasoning states. We then introduce a two-stage fine-tuning approach that enables reasoning models to generate progress estimates (0$\rightarrow$100\%) during inference. Our best fine-tuned model achieves an average error of 10\% for sequences less than 16,000 tokens, offering a practical mechanism for monitoring and interpreting model reasoning in real time.
Updated: 2025-10-08 12:11:48
标题: Reasoning语言模型中的实时进展预测
摘要: 最近推理语言模型的进展 - 尤其是那些使用长期的潜在思维链的模型 - 在复杂的主动任务中展现出了卓越的能力。然而,随着这些模型操作的时间跨度越来越长,它们内部的进展对用户变得不透明,使得预期管理和实时监督变得复杂。在这项工作中,我们调查了实时进度预测是否可行。我们对进度进行离散化,并训练一个线性探测器来分类推理状态。然后,我们引入了一个两阶段微调方法,使推理模型能够在推理过程中生成进度估计(0→100%)。我们最好的微调模型对于长度小于16,000个标记的序列平均误差为10%,为实时监测和解释模型推理提供了一个实用的机制。
更新时间: 2025-10-08 12:11:48
领域: cs.LG,cs.AI
Quantum Sparse Recovery and Quantum Orthogonal Matching Pursuit
We study quantum sparse recovery in non-orthogonal, overcomplete dictionaries: given coherent quantum access to a state and a dictionary of vectors, the goal is to reconstruct the state up to $\ell_2$ error using as few vectors as possible. We first show that the general recovery problem is NP-hard, ruling out efficient exact algorithms in full generality. To overcome this, we introduce Quantum Orthogonal Matching Pursuit (QOMP), the first quantum analogue of the classical OMP greedy algorithm. QOMP combines quantum subroutines for inner product estimation, maximum finding, and block-encoded projections with an error-resetting design that avoids iteration-to-iteration error accumulation. Under standard mutual incoherence and well-conditioned sparsity assumptions, QOMP provably recovers the exact support of a $K$-sparse state in polynomial time. As an application, we give the first framework for sparse quantum tomography with non-orthogonal dictionaries in $\ell_2$ norm, achieving query complexity $\widetilde{O}(\sqrt{N}/\epsilon)$ in favorable regimes and reducing tomography to estimating only $K$ coefficients instead of $N$ amplitudes. In particular, for pure-state tomography with $m=O(N)$ dictionary vectors and sparsity $K=\widetilde{O}(1)$ on a well-conditioned subdictionary, this circumvents the $\widetilde{\Omega}(N/\epsilon)$ lower bound that holds in the dense, orthonormal-dictionary setting, without contradiction, by leveraging sparsity together with non-orthogonality. Beyond tomography, we analyze QOMP in the QRAM model, where it yields polynomial speedups over classical OMP implementations, and provide a quantum algorithm to estimate the mutual incoherence of a dictionary of $m$ vectors in $O(m/\epsilon)$ queries, improving over both deterministic and quantum-inspired classical methods.
Updated: 2025-10-08 12:05:07
标题: 量子稀疏恢复和量子正交匹配追踪
摘要: 我们研究了在非正交、过完备字典中的量子稀疏恢复问题:给定对一个状态和一组向量的一致量子访问,目标是使用尽可能少的向量重构状态,使其$\ell_2$误差最小。我们首先展示了一般恢复问题是NP难的,排除了在完全一般情况下有效的精确算法。为了克服这一难题,我们引入了量子正交匹配追踪(QOMP),这是经典OMP贪婪算法的第一个量子模拟。QOMP结合了用于内积估计、最大值查找和块编码投影的量子子程序,同时采用了避免迭代误差累积的错误重置设计。在标准互不相关和条件良好的稀疏假设下,QOMP可以在多项式时间内可靠地恢复$K$-稀疏状态的确切支持。作为应用,我们提出了第一个在$\ell_2$范数下使用非正交字典进行稀疏量子层析的框架,在有利的情况下实现查询复杂度为$\widetilde{O}(\sqrt{N}/\epsilon)$,将层析问题减少到仅估计$K$个系数而不是$N$个振幅。特别地,对于在条件良好的子字典上具有$m=O(N)$字典向量和稀疏度$K=\widetilde{O}(1)$的纯态层析,通过利用稀疏性和非正交性,这避开了在密集、正交字典设置中成立的$\widetilde{\Omega}(N/\epsilon)$的下界,而不矛盾。除了层析,我们在QRAM模型中分析了QOMP,它相对于经典OMP实现提供了多项式加速,并提供了一个量子算法来估计$m$个向量字典的互不相关性,查询数量为$O(m/\epsilon)$,优于确定性和受量子启发的经典方法。
更新时间: 2025-10-08 12:05:07
领域: quant-ph,cs.DS,cs.LG
The Knowledge Complexity of Quantum Problems
Foundational results in theoretical computer science have established that everything provable, is provable in zero knowledge. However, this assertion fundamentally assumes a classical interpretation of computation and many interesting physical statements that one can hope to prove are not characterized. In this work, we consider decision problems, where the problem instance itself is specified by a (pure) quantum state. We discuss several motivating examples for this notion and, as our main technical result, we show that every quantum problem that is provable with an interactive protocol, is also provable in zero-knowledge. Our protocol achieves unconditional soundness and computational zero-knowledge, under standard assumptions in cryptography. In addition, we show how our techniques yield a protocol for the Uhlmann transformation problem that achieves a meaningful notion of zero-knowledge, also in the presence of a malicious verifier.
Updated: 2025-10-08 12:00:32
标题: 量子问题的知识复杂性
摘要: 在理论计算机科学中的基础性结果已经确定,一切可证明的事情都可以在零知识中被证明。然而,这种断言基本上假定了对计算的经典解释,而许多有趣的物理陈述,人们希望证明的,都没有被表征。在这项工作中,我们考虑决策问题,其中问题实例本身由一个(纯)量子态指定。我们讨论了几个激发这个概念的例子,并且作为我们的主要技术结果,我们展示了每个可以通过交互式协议证明的量子问题,也可以在零知识中被证明。我们的协议在密码学中的标准假设下实现了无条件的正确性和计算零知识。此外,我们展示了我们的技术如何产生一个关于Uhlmann变换问题的协议,它实现了一个有意义的零知识概念,即使在存在恶意验证者的情况下也是如此。
更新时间: 2025-10-08 12:00:32
领域: quant-ph,cs.CR
Train-Free Segmentation in MRI with Cubical Persistent Homology
We present a new general framework for segmentation of MRI scans based on Topological Data Analysis (TDA), offering several advantages over traditional machine learning approaches. The pipeline proceeds in three steps, first identifying the whole object to segment via automatic thresholding, then detecting a distinctive subset whose topology is known in advance, and finally deducing the various components of the segmentation. Unlike most prior TDA uses in medical image segmentation, which are typically embedded within deep networks, our approach is a standalone method tailored to MRI. A key ingredient is the localization of representative cycles from the persistence diagram, which enables interpretable mappings from topological features to anatomical components. In particular, the method offers the ability to perform segmentation without the need for large annotated datasets. Its modular design makes it adaptable to a wide range of data segmentation challenges. We validate the framework on three applications: glioblastoma segmentation in brain MRI, where a sphere is to be detected; myocardium in cardiac MRI, forming a cylinder; and cortical plate detection in fetal brain MRI, whose 2D slices are circles. We compare our method with established supervised and unsupervised baselines.
Updated: 2025-10-08 11:59:15
标题: MRI中基于立方持续同调的无需训练的分割
摘要: 我们提出了一种新的基于拓扑数据分析(TDA)的MRI扫描分割的通用框架,相比传统的机器学习方法,具有几个优点。该流程分为三个步骤,首先通过自动阈值确定要分割的整个对象,然后检测一个具有预先已知拓扑结构的独特子集,最后推断分割的各个组件。与大多数先前在医学图像分割中使用TDA的方法不同,这些方法通常嵌入在深度网络中,我们的方法是一种适用于MRI的独立方法。一个关键因素是从持久图中定位代表性周期,从而实现从拓扑特征到解剖组件的可解释的映射。特别地,该方法能够在不需要大量标注数据集的情况下进行分割。其模块化设计使其能够适应各种数据分割挑战。我们在三个应用中验证了该框架:脑部MRI中的胶质母细胞瘤分割,其中需要检测一个球体;心脏MRI中的心肌,形成一个圆柱体;以及胎儿脑部MRI中的皮质板检测,其2D切片为圆形。我们将我们的方法与已建立的监督和无监督基线进行比较。
更新时间: 2025-10-08 11:59:15
领域: eess.IV,cs.CG,cs.CV,cs.LG,55N31, 68-04, 92-08, 68U10
RevealNet: Distributed Traffic Correlation for Attack Attribution on Programmable Networks
Network attackers have increasingly resorted to proxy chains, VPNs, and anonymity networks to conceal their activities. To tackle this issue, past research has explored the applicability of traffic correlation techniques to perform attack attribution, i.e., to identify an attacker's true network location. However, current traffic correlation approaches rely on well-provisioned and centralized systems that ingest flows from multiple network probes to compute correlation scores. Unfortunately, this makes correlation efforts scale poorly for large high-speed networks. In this paper, we propose RevealNet, a decentralized framework for attack attribution that orchestrates a fleet of P4-programmable switches to perform traffic correlation. RevealNet builds on a set of correlation primitives inspired by prior work on computing and comparing flow sketches -- compact summaries of flows' key characteristics -- to enable efficient, distributed, in-network traffic correlation. Our evaluation suggests that RevealNet achieves comparable accuracy to centralized attack attribution systems while significantly reducing both the computational complexity and bandwidth overheads imposed by correlation tasks.
Updated: 2025-10-08 11:56:54
标题: RevealNet:在可编程网络上进行攻击归因的分布式流量关联
摘要: 网络攻击者越来越倾向于使用代理链、VPN和匿名网络来掩盖其活动。为了解决这个问题,过去的研究探讨了流量相关性技术在进行攻击归因方面的适用性,即识别攻击者的真实网络位置。然而,当前的流量相关性方法依赖于充分配备和集中式系统,这些系统从多个网络探针接收流量以计算相关性分数。不幸的是,这使得相关性工作在大型高速网络上扩展效果不佳。 在本文中,我们提出了RevealNet,一个用于攻击归因的分散框架,它编排一组可编程P4交换机来执行流量相关性。RevealNet建立在一组相关性基元之上,灵感来源于先前关于计算和比较流摘要的工作,这些摘要是流的关键特征的简洁总结,以实现高效的、分布式的、网络内的流量相关性。我们的评估表明,RevealNet实现了与集中式攻击归因系统相当的准确性,同时显著降低了相关性任务所施加的计算复杂性和带宽开销。
更新时间: 2025-10-08 11:56:54
领域: cs.CR,cs.NI
A Calibration-Free Fixed Point of Curved Boolean Logic Matching the Fine-Structure Constant
We show that Curved Boolean Logic (CBL) admits a calibration-free fixed point at which the per-face holonomy theta_0 is the same across independent minimal faces (CHSH, KCBS, SAT_6). Equality is enforced by solving the two-component system F(delta, gamma_4, gamma_5, gamma_6) = (theta_0^(4) - theta_0^(5), theta_0^(5) - theta_0^(6)) = 0 with a Gauss-Newton method (no external scale). A finite-difference Jacobian is full rank at the solution, implying local uniqueness. Working at the coupling level g = |theta_0|/(2*pi*n) removes hidden length factors; at the equality point our normalization audit shows g = alpha (Thomson limit) within numerical tolerance. The SU(1,1) corner words and overlap placements used to compute theta_0 are specified exactly; we also report a variational minimax analysis on g and a pilot non-backtracking spectral density that coincides numerically with the per-edge coupling, suggesting a purely topological formulation. Scope: the match is to the low-energy (Thomson) limit; a full spectral equality on the contextual complex is left as a short conjecture. These results promote the CBL--alpha connection from a calibrated identification to a calibration-free derivation candidate.
Updated: 2025-10-08 11:54:50
标题: 一个无需校准的曲线布尔逻辑固定点,匹配精细结构常数
摘要: 我们展示了弯曲布尔逻辑(CBL)在一个无需校准的不动点处,独立最小面(CHSH、KCBS、SAT_6)上的面向量角theta_0相同。通过解决两组分系统F(δ,γ_4,γ_5,γ_6)=(θ_0^4 - θ_0^5,θ_0^5 - θ_0^6)= 0,使用高斯-牛顿方法(无外部尺度)实现这种相等性。在解处,有限差分雅可比矩阵是完全秩的,暗示局部唯一性。在耦合水平g = |θ_0| /(2*pi*n)处工作去除了隐藏的长度因子;在平等点上,我们的标准化审计显示g = alpha(汤姆逊极限)在数值容差内。用于计算theta_0的SU(1,1)角词和重叠位置被精确指定;我们还报告了关于g的变分极小分析以及与边耦合数值一致的试验非回溯谱密度,这暗示了一个纯拓扑制定。范围:匹配是对低能(汤姆逊)极限的;对于上下文复杂性的完整谱相等性留作短期推测。这些结果将CBL-α连接从校准鉴定提升为无需校准的推导候选。
更新时间: 2025-10-08 11:54:50
领域: cs.LO,cs.AI,cs.CC,quant-ph,68Q17, 68Q25,F.1.1; F.2.2; I.2.3
Bayesian Nonparametric Dynamical Clustering of Time Series
We present a method that models the evolution of an unbounded number of time series clusters by switching among an unknown number of regimes with linear dynamics. We develop a Bayesian non-parametric approach using a hierarchical Dirichlet process as a prior on the parameters of a Switching Linear Dynamical System and a Gaussian process prior to model the statistical variations in amplitude and temporal alignment within each cluster. By modeling the evolution of time series patterns, the method avoids unnecessary proliferation of clusters in a principled manner. We perform inference by formulating a variational lower bound for off-line and on-line scenarios, enabling efficient learning through optimization. We illustrate the versatility and effectiveness of the approach through several case studies of electrocardiogram analysis using publicly available databases.
Updated: 2025-10-08 11:52:39
标题: 贝叶斯非参数动态时间序列聚类
摘要: 我们提出了一种方法,通过在未知数量的制度之间切换来模拟无限数量的时间序列集群的演变,这些制度具有线性动态。我们采用贝叶斯非参数方法,使用层次狄利克雷过程作为切换线性动态系统参数的先验,并使用高斯过程先验来模拟每个集群内振幅和时间对齐的统计变化。通过模拟时间序列模式的演变,该方法以原则性的方式避免了集群的不必要繁殖。我们通过为离线和在线场景制定变分下界来进行推断,从而实现了通过优化进行高效学习。我们通过使用公开可用的数据库进行的多个心电图分析案例研究展示了该方法的灵活性和有效性。
更新时间: 2025-10-08 11:52:39
领域: stat.ML,cs.AI,cs.LG,stat.AP,I.5; I.2.1
Automating RT Planning at Scale: High Quality Data For AI Training
Radiotherapy (RT) planning is complex, subjective, and time-intensive. Advances with artificial intelligence (AI) promise to improve its precision and efficiency, but progress is often limited by the scarcity of large, standardized datasets. To address this, we introduce the Automated Iterative RT Planning (AIRTP) system, a scalable solution for generating high-quality treatment plans. This scalable solution is designed to generate substantial volumes of consistently high-quality treatment plans, overcoming a key obstacle in the advancement of AI-driven RT planning. Our AIRTP pipeline adheres to clinical guidelines and automates essential steps, including organ-at-risk (OAR) contouring, helper structure creation, beam setup, optimization, and plan quality improvement, using AI integrated with RT planning software like Varian Eclipse. Furthermore, a novel approach for determining optimization parameters to reproduce 3D dose distributions, i.e. a method to convert dose predictions to deliverable treatment plans constrained by machine limitations is proposed. A comparative analysis of plan quality reveals that our automated pipeline produces treatment plans of quality comparable to those generated manually, which traditionally require several hours of labor per plan. Committed to public research, the first data release of our AIRTP pipeline includes nine cohorts covering head-and-neck and lung cancer sites to support an AAPM 2025 challenge. To our best knowledge, this dataset features more than 10 times number of plans compared to the largest existing well-curated public dataset. Repo: https://github.com/RiqiangGao/GDP-HMM_AAPMChallenge.
Updated: 2025-10-08 11:49:31
标题: 大规模自动化放射治疗计划:为人工智能训练提供高质量数据
摘要: 放射治疗(RT)计划是复杂的、主观的,并且耗时。人工智能(AI)的进步承诺提高其精确性和效率,但进展通常受到大型标准化数据集的稀缺性的限制。为了解决这个问题,我们引入了自动迭代RT规划(AIRTP)系统,这是一个可扩展的解决方案,用于生成高质量的治疗计划。这种可扩展的解决方案旨在生成大量一致高质量的治疗计划,克服了AI驱动的RT规划进展中的一个关键障碍。我们的AIRTP流水线遵循临床指南,自动化关键步骤,包括风险器官(OAR)轮廓绘制、辅助结构创建、束设置、优化和计划质量改进,使用与Varian Eclipse等RT规划软件集成的AI。此外,提出了一种确定优化参数以重现3D剂量分布的新方法,即一种将剂量预测转换为受机器限制约束的可交付治疗计划的方法。计划质量的比较分析显示,我们的自动化流水线生成的治疗计划质量与手动生成的计划相当,传统上每个计划需要数小时的工作。致力于公共研究,我们的AIRTP流水线的第一批数据发布包括涵盖头颈部和肺癌部位的九个队列,以支持AAPM 2025挑战。据我们所知,与最大的现有精心策划的公共数据集相比,该数据集包含的计划数量超过10倍。Repo: https://github.com/RiqiangGao/GDP-HMM_AAPMChallenge.
更新时间: 2025-10-08 11:49:31
领域: cs.HC,cs.LG,cs.RO
DecompGAIL: Learning Realistic Traffic Behaviors with Decomposed Multi-Agent Generative Adversarial Imitation Learning
Realistic traffic simulation is critical for the development of autonomous driving systems and urban mobility planning, yet existing imitation learning approaches often fail to model realistic traffic behaviors. Behavior cloning suffers from covariate shift, while Generative Adversarial Imitation Learning (GAIL) is notoriously unstable in multi-agent settings. We identify a key source of this instability: irrelevant interaction misguidance, where a discriminator penalizes an ego vehicle's realistic behavior due to unrealistic interactions among its neighbors. To address this, we propose Decomposed Multi-agent GAIL (DecompGAIL), which explicitly decomposes realism into ego-map and ego-neighbor components, filtering out misleading neighbor: neighbor and neighbor: map interactions. We further introduce a social PPO objective that augments ego rewards with distance-weighted neighborhood rewards, encouraging overall realism across agents. Integrated into a lightweight SMART-based backbone, DecompGAIL achieves state-of-the-art performance on the WOMD Sim Agents 2025 benchmark.
Updated: 2025-10-08 11:46:39
标题: DecompGAIL:使用分解的多智能生成对抗模仿学习学习真实交通行为
摘要: 实际交通仿真对于自动驾驶系统和城市移动规划的发展至关重要,然而现有的模仿学习方法往往无法模拟真实的交通行为。行为克隆存在协变量转移的问题,而生成对抗性模仿学习(GAIL)在多智能体环境中不稳定。我们确定了这种不稳定性的一个关键来源:不相关的交互误导,即鉴别器由于邻居之间的不真实交互而惩罚自我车辆的真实行为。为了解决这个问题,我们提出了分解多智能体GAIL(DecompGAIL),明确将真实性分解为自我地图和自我邻居组件,过滤掉误导性的邻居:邻居和邻居:地图交互。我们进一步引入了一个社交PPO目标,将自我奖励与距离加权的邻域奖励相结合,鼓励整体代理之间的实际性。集成到轻量级的SMART基础上,DecompGAIL在WOMD Sim Agents 2025基准测试中实现了最先进的性能。
更新时间: 2025-10-08 11:46:39
领域: cs.LG,cs.AI,cs.RO
Utilizing Large Language Models for Machine Learning Explainability
This study explores the explainability capabilities of large language models (LLMs), when employed to autonomously generate machine learning (ML) solutions. We examine two classification tasks: (i) a binary classification problem focused on predicting driver alertness states, and (ii) a multilabel classification problem based on the yeast dataset. Three state-of-the-art LLMs (i.e. OpenAI GPT, Anthropic Claude, and DeepSeek) are prompted to design training pipelines for four common classifiers: Random Forest, XGBoost, Multilayer Perceptron, and Long Short-Term Memory networks. The generated models are evaluated in terms of predictive performance (recall, precision, and F1-score) and explainability using SHAP (SHapley Additive exPlanations). Specifically, we measure Average SHAP Fidelity (Mean Squared Error between SHAP approximations and model outputs) and Average SHAP Sparsity (number of features deemed influential). The results reveal that LLMs are capable of producing effective and interpretable models, achieving high fidelity and consistent sparsity, highlighting their potential as automated tools for interpretable ML pipeline generation. The results show that LLMs can produce effective, interpretable pipelines with high fidelity and consistent sparsity, closely matching manually engineered baselines.
Updated: 2025-10-08 11:46:23
标题: 利用大型语言模型进行机器学习的可解释性解释
摘要: 这项研究探讨了大型语言模型(LLMs)在自主生成机器学习(ML)解决方案时的可解释性能力。我们考察了两个分类任务:(i)一个针对预测驾驶员警觉状态的二元分类问题,和(ii)基于酵母数据集的多标签分类问题。我们要求三种最先进的LLMs(即OpenAI GPT、Anthropic Claude和DeepSeek)为四种常见分类器设计训练管道:随机森林、XGBoost、多层感知器和长短期记忆网络。生成的模型根据预测性能(召回率、精确率和F1分数)和可解释性(使用SHAP进行解释)进行评估。具体来说,我们衡量了平均SHAP保真度(SHAP近似值与模型输出之间的均方误差)和平均SHAP稀疏性(被认为有影响的特征数量)。结果显示,LLMs能够产生有效且可解释的模型,实现高度的保真度和一致的稀疏性,凸显它们作为可解释ML管道生成的自动化工具的潜力。结果表明,LLMs能够生成具有高保真度和一致稀疏性的有效且可解释的管道,与手动设计的基线模型非常接近。
更新时间: 2025-10-08 11:46:23
领域: cs.LG
Vacuum Spiker: A Spiking Neural Network-Based Model for Efficient Anomaly Detection in Time Series
Anomaly detection is a key task across domains such as industry, healthcare, and cybersecurity. Many real-world anomaly detection problems involve analyzing multiple features over time, making time series analysis a natural approach for such problems. While deep learning models have achieved strong performance in this field, their trend to exhibit high energy consumption limits their deployment in resource-constrained environments such as IoT devices, edge computing platforms, and wearables. To address this challenge, this paper introduces the \textit{Vacuum Spiker algorithm}, a novel Spiking Neural Network-based method for anomaly detection in time series. It incorporates a new detection criterion that relies on global changes in neural activity rather than reconstruction or prediction error. It is trained using Spike Time-Dependent Plasticity in a novel way, intended to induce changes in neural activity when anomalies occur. A new efficient encoding scheme is also proposed, which discretizes the input space into non-overlapping intervals, assigning each to a single neuron. This strategy encodes information with a single spike per time step, improving energy efficiency compared to conventional encoding methods. Experimental results on publicly available datasets show that the proposed algorithm achieves competitive performance while significantly reducing energy consumption, compared to a wide set of deep learning and machine learning baselines. Furthermore, its practical utility is validated in a real-world case study, where the model successfully identifies power curtailment events in a solar inverter. These results highlight its potential for sustainable and efficient anomaly detection.
Updated: 2025-10-08 11:43:54
标题: 真空尖峰器:基于脉冲神经网络的模型,用于时序数据高效异常检测
摘要: 异常检测是跨行业、医疗保健和网络安全等领域的关键任务。许多现实世界的异常检测问题涉及随时间分析多个特征,因此时间序列分析是解决这类问题的一种自然方法。虽然深度学习模型在该领域取得了强大的性能,但它们倾向于展示高能耗,限制了它们在资源受限环境(如物联网设备、边缘计算平台和可穿戴设备)中的部署。为了解决这一挑战,本文引入了“真空尖峰算法”,这是一种基于尖峰神经网络的时间序列异常检测方法。它结合了一种新的检测标准,该标准依赖于神经活动的全局变化,而不是重建或预测误差。它使用一种新颖的方式训练,利用时序相关塑性来在异常发生时诱导神经活动的变化。还提出了一种新的高效编码方案,将输入空间离散化为不重叠的区间,并将每个区间分配给一个单独的神经元。与传统的编码方法相比,这种策略在每个时间步只使用一个尖峰来编码信息,提高了能效。对公开可用数据集的实验结果表明,所提出的算法在显著减少能耗的同时取得了竞争性能,与广泛的深度学习和机器学习基线相比。此外,在一个真实案例研究中验证了其实用性,模型成功地识别了太阳逆变器中的功率削减事件。这些结果凸显了其在可持续和高效的异常检测方面的潜力。
更新时间: 2025-10-08 11:43:54
领域: cs.LG,I.2; I.5
Angular Constraint Embedding via SpherePair Loss for Constrained Clustering
Constrained clustering integrates domain knowledge through pairwise constraints. However, existing deep constrained clustering (DCC) methods are either limited by anchors inherent in end-to-end modeling or struggle with learning discriminative Euclidean embedding, restricting their scalability and real-world applicability. To avoid their respective pitfalls, we propose a novel angular constraint embedding approach for DCC, termed SpherePair. Using the SpherePair loss with a geometric formulation, our method faithfully encodes pairwise constraints and leads to embeddings that are clustering-friendly in angular space, effectively separating representation learning from clustering. SpherePair preserves pairwise relations without conflict, removes the need to specify the exact number of clusters, generalizes to unseen data, enables rapid inference of the number of clusters, and is supported by rigorous theoretical guarantees. Comparative evaluations with state-of-the-art DCC methods on diverse benchmarks, along with empirical validation of theoretical insights, confirm its superior performance, scalability, and overall real-world effectiveness. Code is available at \href{https://github.com/spherepaircc/SpherePairCC/tree/main}{our repository}.
Updated: 2025-10-08 11:43:20
标题: 通过SpherePair Loss进行角度约束嵌入以实现约束聚类
摘要: 受限制的聚类通过成对约束整合领域知识。然而,现有的深度受限制聚类(DCC)方法要么受到端到端建模中固有的锚点的限制,要么难以学习有区分性的欧氏嵌入,限制了它们的可扩展性和实际应用性。为了避免各自的缺点,我们提出了一种新颖的用于DCC的角度约束嵌入方法,称为SpherePair。使用几何公式的SpherePair损失,我们的方法忠实地编码了成对约束,并导致了在角度空间中友好聚类的嵌入,有效地将表示学习与聚类分离开来。SpherePair保留了成对关系而没有冲突,消除了需要指定确切聚类数量的需要,泛化到未见数据,能够快速推断出聚类数量,并得到了严格的理论保证支持。与最先进的DCC方法在多样的基准测试上进行比较评估,以及对理论见解的实证验证,证实了它在性能、可扩展性和整体实际有效性方面的优越表现。代码可在我们的存储库\href{https://github.com/spherepaircc/SpherePairCC/tree/main}{下载}。
更新时间: 2025-10-08 11:43:20
领域: cs.LG,cs.AI,cs.CV
Tempo: Compiled Dynamic Deep Learning with Symbolic Dependence Graphs
Deep learning (DL) algorithms are often defined in terms of temporal relationships: a tensor at one timestep may depend on tensors from earlier or later timesteps. Such dynamic dependencies (and corresponding dynamic tensor shapes) are difficult to express and optimize: while eager DL systems support such dynamism, they cannot apply compiler-based optimizations; graph-based systems require static tensor shapes, which forces users to pad tensors or break-up programs into multiple static graphs. We describe Tempo, a new DL system that combines the dynamism of eager execution with the whole-program optimizations of graph-based compilation. Tempo achieves this through a declarative programming model with recurrent tensors, which include explicit temporal dimensions. Temporal dimensions can be indexed using symbolic expressions to express dynamic dependencies on past and future tensors. Based on this, Tempo constructs a symbolic dependence graph, which concisely encodes dynamic dependencies between operators, and applies whole-program optimizations, such as algebraic simplifications, vectorization, tiling, and fusion. By tiling dynamic dependencies into static-size blocks, Tempo can also reuse existing static code-generators. It then uses a polyhedral model to find a feasible execution schedule, which includes memory management operations. We show that Tempo achieves a 7$\times$ speedup over JAX for Llama-3.2-3B decoding; for reinforcement learning algorithms, Tempo achieves a 54$\times$ speedup, with 16$\times$ lower peak memory usage.
Updated: 2025-10-08 11:36:12
标题: 速度:使用符号依赖图编译动态深度学习
摘要: 深度学习(DL)算法通常通过时间关系来定义:一个时间步的张量可能依赖于较早或较晚时间步的张量。这种动态依赖关系(以及相应的动态张量形状)很难表达和优化:虽然急切的DL系统支持这种动态性,但无法应用基于编译器的优化;基于图的系统需要静态张量形状,这迫使用户填充张量或将程序分解成多个静态图。 我们描述了Tempo,一个新的DL系统,它将急切执行的动态性与基于图的编译的整个程序优化相结合。Tempo通过具有循环张量的声明式编程模型实现了这一点,其中包括显式的时间维度。时间维度可以使用符号表达式进行索引,以表达对过去和未来张量的动态依赖关系。基于此,Tempo构建了一个符号依赖图,简洁地编码了运算符之间的动态依赖关系,并应用整个程序优化,例如代数简化、矢量化、平铺和融合。通过将动态依赖关系划分为静态大小的块,Tempo还可以重复使用现有的静态代码生成器。然后,它使用一个多面体模型找到可行的执行时间表,其中包括内存管理操作。我们展示了Tempo在Llama-3.2-3B解码中实现了7倍的加速;对于强化学习算法,Tempo实现了54倍的加速,内存使用峰值降低了16倍。
更新时间: 2025-10-08 11:36:12
领域: cs.DC,cs.AI,cs.LG,I.2; I.1
Who Pays for Fairness? Rethinking Recourse under Social Burden
Machine learning based predictions are increasingly used in sensitive decision-making applications that directly affect our lives. This has led to extensive research into ensuring the fairness of classifiers. Beyond just fair classification, emerging legislation now mandates that when a classifier delivers a negative decision, it must also offer actionable steps an individual can take to reverse that outcome. This concept is known as algorithmic recourse. Nevertheless, many researchers have expressed concerns about the fairness guarantees within the recourse process itself. In this work, we provide a holistic theoretical characterization of unfairness in algorithmic recourse, formally linking fairness guarantees in recourse and classification, and highlighting limitations of the standard equal cost paradigm. We then introduce a novel fairness framework based on social burden, along with a practical algorithm (MISOB), broadly applicable under real-world conditions. Empirical results on real-world datasets show that MISOB reduces the social burden across all groups without compromising overall classifier accuracy.
Updated: 2025-10-08 11:28:46
标题: 谁为公平买单?重新思考社会负担下的补救措施
摘要: 基于机器学习的预测越来越多地应用于直接影响我们生活的敏感决策应用中。这导致了对分类器公平性的广泛研究。除了公平分类之外,新兴的立法现在要求当分类器做出负面决定时,它必须提供个人可以采取的可行步骤来扭转结果。这一概念被称为算法补救。然而,许多研究人员对补救过程本身中的公平性保证表示担忧。在这项工作中,我们提供了对算法补救中的不公平性的整体理论表征,正式地将补救和分类中的公平性保证联系起来,并突显了标准平等成本范式的局限性。然后,我们引入了一个基于社会负担的新型公平框架,以及一个实用算法(MISOB),在真实世界条件下广泛适用。真实世界数据集的实证结果表明,MISOB减少了所有群体的社会负担,而不影响整体分类器的准确性。
更新时间: 2025-10-08 11:28:46
领域: cs.LG,cs.CY
GenFacts-Generative Counterfactual Explanations for Multi-Variate Time Series
Counterfactual explanations aim to enhance model transparency by illustrating how input modifications can change model predictions. In the multivariate time series domain, existing approaches often produce counterfactuals that lack validity, plausibility, or intuitive interpretability. We present \textbf{GenFacts}, a novel generative framework for producing plausible and actionable counterfactual explanations for time series classifiers. GenFacts introduces a structured approach to latent space modeling and targeted counterfactual synthesis. We evaluate GenFacts on radar gesture recognition as an industrial use case and handwritten letter trajectories as an intuitive benchmark. Across both datasets, GenFacts consistently outperforms baseline methods in plausibility metrics (+18.7\%) and achieves the highest interpretability scores in user studies. These results underscore that realism and user-centered interpretability, rather than sparsity alone, are vital for actionable counterfactuals in time series applications.
Updated: 2025-10-08 11:16:15
标题: GenFacts-多变量时间序列的生成对策解释
摘要: 反事实解释旨在通过说明输入修改如何改变模型预测来增强模型的透明度。在多变量时间序列领域,现有方法常常产生缺乏有效性、合理性或直观可解释性的反事实。我们提出了一种新颖的生成框架\textbf{GenFacts},用于为时间序列分类器生成合理且可操作的反事实解释。GenFacts引入了一种结构化的潜在空间建模和有针对性的反事实合成方法。 我们将GenFacts在工业使用案例雷达手势识别和直观基准手写字母轨迹上进行评估。在两个数据集中,GenFacts在合理性指标上始终优于基线方法(+18.7\%),并在用户研究中获得最高的可解释性评分。这些结果强调,在时间序列应用中,现实主义和以用户为中心的可解释性,而不仅仅是稀疏性,对于可操作的反事实至关重要。
更新时间: 2025-10-08 11:16:15
领域: cs.LG
NAR-*ICP: Neural Execution of Classical ICP-based Pointcloud Registration Algorithms
This study explores the intersection of neural networks and classical robotics algorithms through the Neural Algorithmic Reasoning (NAR) blueprint, enabling the training of neural networks to reason like classical robotics algorithms by learning to execute them. Algorithms are integral to robotics and safety-critical applications due to their predictable and consistent performance through logical and mathematical principles. In contrast, while neural networks are highly adaptable, handling complex, high-dimensional data and generalising across tasks, they often lack interpretability and transparency in their internal computations. To bridge the two, we propose a novel Graph Neural Network (GNN)-based framework, NAR-*ICP, that learns the intermediate computations of classical ICP-based registration algorithms, extending the CLRS Benchmark. We evaluate our approach across real-world and synthetic datasets, demonstrating its flexibility in handling complex inputs, and its potential to be used within larger learning pipelines. Our method achieves superior performance compared to the baselines, even surpassing the algorithms it was trained on, further demonstrating its ability to generalise beyond the capabilities of traditional algorithms.
Updated: 2025-10-08 11:15:48
标题: NAR-*ICP:基于神经网络的经典ICP点云配准算法执行
摘要: 这项研究通过神经网络和经典机器人算法的交叉点,通过神经算法推理(NAR)蓝图,实现了训练神经网络像经典机器人算法一样进行推理,学习执行它们。算法对机器人和安全关键应用至关重要,因为它们通过逻辑和数学原理具有可预测和一致的性能。相比之下,神经网络具有高度适应性,处理复杂的高维数据并在任务之间泛化,但它们在内部计算中常常缺乏可解释性和透明度。为了弥合这两者之间的差距,我们提出了一种基于图神经网络(GNN)的新型框架,NAR-*ICP,学习经典ICP-based注册算法的中间计算,扩展了CLRS基准。我们在真实世界和合成数据集上评估了我们的方法,展示了它在处理复杂输入方面的灵活性,以及在更大的学习管道中使用的潜力。我们的方法表现优异,甚至超过了训练过的算法,进一步展示了其能够超越传统算法的能力。
更新时间: 2025-10-08 11:15:48
领域: cs.RO,cs.AI,cs.LG
Multi-Dimensional Autoscaling of Stream Processing Services on Edge Devices
Edge devices have limited resources, which inevitably leads to situations where stream processing services cannot satisfy their needs. While existing autoscaling mechanisms focus entirely on resource scaling, Edge devices require alternative ways to sustain the Service Level Objectives (SLOs) of competing services. To address these issues, we introduce a Multi-dimensional Autoscaling Platform (MUDAP) that supports fine-grained vertical scaling across both service- and resource-level dimensions. MUDAP supports service-specific scaling tailored to available parameters, e.g., scale data quality or model size for a particular service. To optimize the execution across services, we present a scaling agent based on Regression Analysis of Structural Knowledge (RASK). The RASK agent efficiently explores the solution space and learns a continuous regression model of the processing environment for inferring optimal scaling actions. We compared our approach with two autoscalers, the Kubernetes VPA and a reinforcement learning agent, for scaling up to 9 services on a single Edge device. Our results showed that RASK can infer an accurate regression model in merely 20 iterations (i.e., observe 200s of processing). By increasingly adding elasticity dimensions, RASK sustained the highest request load with 28% less SLO violations, compared to baselines.
Updated: 2025-10-08 10:51:50
标题: 边缘设备上流处理服务的多维自动扩展
摘要: 边缘设备资源有限,这不可避免地导致流处理服务无法满足它们的需求的情况。虽然现有的自动扩展机制完全专注于资源扩展,但边缘设备需要替代方法来维持竞争服务的服务级目标(SLOs)。为了解决这些问题,我们引入了一个支持服务和资源级细粒度垂直扩展的多维自动扩展平台(MUDAP)。MUDAP支持针对可用参数定制的服务特定扩展,例如,为特定服务扩展数据质量或模型大小。为了优化跨服务的执行,我们提出了基于结构知识回归分析(RASK)的扩展代理。RASK代理有效地探索解决空间,并学习一个连续的回归模型,用于推断最佳的扩展动作。我们将我们的方法与两种自动扩展器进行了比较,即Kubernetes VPA和强化学习代理,在单个边缘设备上扩展达到9个服务。我们的结果显示,RASK可以在仅20次迭代(即观察200秒的处理)中推断出准确的回归模型。通过不断增加弹性维度,RASK在SLO违规率较基准低28%的情况下,维持了最高的请求负载。
更新时间: 2025-10-08 10:51:50
领域: cs.DC,cs.AI,cs.LG,cs.PF
Edit-Based Flow Matching for Temporal Point Processes
Temporal point processes (TPPs) are a fundamental tool for modeling event sequences in continuous time, but most existing approaches rely on autoregressive parameterizations that are limited by their sequential sampling. Recent non-autoregressive, diffusion-style models mitigate these issues by jointly interpolating between noise and data through event insertions and deletions in a discrete Markov chain. In this work, we generalize this perspective and introduce an Edit Flow process for TPPs that transports noise to data via insert, delete, and substitute edit operations. By learning the instantaneous edit rates within a continuous-time Markov chain framework, we attain a flexible and efficient model that effectively reduces the total number of necessary edit operations during generation. Empirical results demonstrate the generative flexibility of our unconditionally trained model in a wide range of unconditional and conditional generation tasks on benchmark TPPs.
Updated: 2025-10-08 10:51:35
标题: 基于编辑的流匹配用于时间点过程
摘要: 时间点过程(TPPs)是在连续时间中建模事件序列的基本工具,但大多数现有方法依赖于受其顺序采样限制的自回归参数化。最近的非自回归、扩散式模型通过在离散马尔可夫链中通过事件插入和删除共同插值噪声和数据来缓解这些问题。在这项工作中,我们概括了这一观点,并为TPPs引入了一个编辑流程,通过插入、删除和替换编辑操作将噪声传递到数据。通过在连续时间马尔可夫链框架内学习瞬时编辑速率,我们获得了一个灵活高效的模型,有效地减少了生成过程中必要的编辑操作总数。实证结果展示了我们无条件训练模型在广泛的无条件和有条件生成任务中的生成灵活性。
更新时间: 2025-10-08 10:51:35
领域: cs.LG
Inference-Time Scaling of Discrete Diffusion Models via Importance Weighting and Optimal Proposal Design
Discrete diffusion models have become highly effective across various domains. However, real-world applications often require the generative process to adhere to certain constraints. To this end, we propose a Sequential Monte Carlo (SMC) framework that enables scalable inference-time control of discrete diffusion models through principled importance weighting and optimal proposal construction. Specifically, our approach derives tractable importance weights for a range of intermediate targets and characterises the optimal proposal, for which we develop two practical approximations: a first-order gradient-based approximation and an amortised proposal trained to minimise the log-variance of the importance weights. Empirical results across synthetic tasks, language modelling, biology design, and text-to-image generation demonstrate that our framework enhances controllability and sample quality, highlighting the effectiveness of SMC as a versatile recipe for scaling discrete diffusion models at inference time.
Updated: 2025-10-08 10:49:29
标题: 离散扩散模型的推理时间缩放:通过重要性加权和最佳提议设计
摘要: 离散扩散模型在各个领域已经变得非常有效。然而,现实世界的应用通常需要生成过程遵守某些约束。为此,我们提出了一种顺序蒙特卡洛(SMC)框架,通过原则性的重要性加权和最佳提议构建,实现对离散扩散模型进行可伸缩的推理时间控制。具体来说,我们的方法为一系列中间目标导出了可处理的重要性权重,并表征了最佳提议,我们开发了两种实用的近似方法:基于梯度的一阶逼近和训练以最小化重要性权重的对数方差的摊薄提议。通过合成任务、语言建模、生物设计以及文本到图像生成等经验结果表明,我们的框架提高了可控性和样本质量,突显了SMC作为一个多功能的方法,在推理时间扩展离散扩散模型方面的有效性。
更新时间: 2025-10-08 10:49:29
领域: cs.LG
MoRE-GNN: Multi-omics Data Integration with a Heterogeneous Graph Autoencoder
The integration of multi-omics single-cell data remains challenging due to high-dimensionality and complex inter-modality relationships. To address this, we introduce MoRE-GNN (Multi-omics Relational Edge Graph Neural Network), a heterogeneous graph autoencoder that combines graph convolution and attention mechanisms to dynamically construct relational graphs directly from data. Evaluations on six publicly available datasets demonstrate that MoRE-GNN captures biologically meaningful relationships and outperforms existing methods, particularly in settings with strong inter-modality correlations. Furthermore, the learned representations allow for accurate downstream cross-modal predictions. While performance may vary with dataset complexity, MoRE-GNN offers an adaptive, scalable and interpretable framework for advancing multi-omics integration.
Updated: 2025-10-08 10:48:15
标题: MoRE-GNN: 使用异质图自编码器进行多组学数据整合
摘要: 多组学单细胞数据的整合仍然具有挑战性,因为其高维度和复杂的模态间关系。为了解决这个问题,我们引入了MoRE-GNN(多组学关系边图神经网络),这是一个异构图自动编码器,结合了图卷积和注意机制,可以直接从数据中动态构建关系图。对六个公开可用的数据集进行评估表明,MoRE-GNN捕捉了生物意义上的关系,并在特别具有强模态间相关性的情况下胜过现有方法。此外,学习到的表示允许准确的跨模态预测。虽然性能可能会随着数据集复杂性的变化而有所不同,但MoRE-GNN提供了一种适应性强、可扩展和可解释的框架,用于促进多组学整合。
更新时间: 2025-10-08 10:48:15
领域: cs.LG,cs.AI
SaFeR-VLM: Toward Safety-aware Fine-grained Reasoning in Multimodal Models
Multimodal Large Reasoning Models (MLRMs) demonstrate impressive cross-modal reasoning but often amplify safety risks under adversarial or unsafe prompts, a phenomenon we call the \textit{Reasoning Tax}. Existing defenses mainly act at the output level and do not constrain the reasoning process, leaving models exposed to implicit risks. In this paper, we propose SaFeR-VLM, a safety-aligned reinforcement learning framework that embeds safety directly into multimodal reasoning. The framework integrates four components: (I) QI-Safe-10K, a curated dataset emphasizing safety-critical and reasoning-sensitive cases; (II) safety-aware rollout, where unsafe generations undergo reflection and correction instead of being discarded; (III) structured reward modeling with multi-dimensional weighted criteria and explicit penalties for hallucinations and contradictions; and (IV) GRPO optimization, which reinforces both safe and corrected trajectories. This unified design shifts safety from a passive safeguard to an active driver of reasoning, enabling scalable and generalizable safety-aware reasoning. SaFeR-VLM further demonstrates robustness against both explicit and implicit risks, supporting dynamic and interpretable safety decisions beyond surface-level filtering. SaFeR-VLM-3B achieves average performance $70.13$ and $78.97$ on safety and helpfulness across six benchmarks, surpassing both same-scale and $>10\times$ larger models such as Skywork-R1V3-38B, Qwen2.5VL-72B, and GLM4.5V-106B. Remarkably, SaFeR-VLM-7B benefits from its increased scale to surpass GPT-5-mini and Gemini-2.5-Flash by \num{6.47} and \num{16.76} points respectively on safety metrics, achieving this improvement without any degradation in helpfulness performance. Our codes are available at https://github.com/HarveyYi/SaFeR-VLM.
Updated: 2025-10-08 10:39:12
标题: SaFeR-VLM: 朝着多模态模型中的安全感知细粒度推理
摘要: 多模态大推理模型(MLRMs)展示了令人印象深刻的跨模态推理能力,但往往在对抗性或不安全的提示下放大了安全风险,这种现象被我们称为\textit{推理税}。现有的防御主要在输出级别起作用,而不约束推理过程,使模型暴露于隐含风险之中。在本文中,我们提出了SaFeR-VLM,这是一个将安全直接嵌入多模态推理的安全对齐强化学习框架。该框架集成了四个组件:(I)QI-Safe-10K,一个强调安全关键和推理敏感案例的策划数据集;(II)安全感知展开,不安全的生成经历反思和纠正,而不是被丢弃;(III)结构化奖励建模,使用多维加权标准和对幻觉和矛盾的明确惩罚;以及(IV)GRPO优化,同时强化安全和纠正的轨迹。这种统一设计将安全从被动保护转变为推理的主动驱动,实现可扩展和可泛化的安全感知推理。SaFeR-VLM进一步展示了对显式和隐式风险的鲁棒性,支持动态和可解释的安全决策,超越了表面级别过滤。SaFeR-VLM-3B在六个基准测试中实现了平均性能$70.13$和$78.97$,在安全性和帮助性上超过了同等规模和$>10\times$更大的模型,如Skywork-R1V3-38B、Qwen2.5VL-72B和GLM4.5V-106B。值得注意的是,SaFeR-VLM-7B由于规模增加而超过了GPT-5-mini和Gemini-2.5-Flash,分别在安全性指标上实现了6.47和16.76个点的提高,而在帮助性表现上没有任何降级。我们的代码可在https://github.com/HarveyYi/SaFeR-VLM找到。
更新时间: 2025-10-08 10:39:12
领域: cs.LG,cs.CV
Multi-hop Deep Joint Source-Channel Coding with Deep Hash Distillation for Semantically Aligned Image Retrieval
We consider image transmission via deep joint source-channel coding (DeepJSCC) over multi-hop additive white Gaussian noise (AWGN) channels by training a DeepJSCC encoder-decoder pair with a pre-trained deep hash distillation (DHD) module to semantically cluster images, facilitating security-oriented applications through enhanced semantic consistency and improving the perceptual reconstruction quality. We train the DeepJSCC module to both reduce mean square error (MSE) and minimize cosine distance between DHD hashes of source and reconstructed images. Significantly improved perceptual quality as a result of semantic alignment is illustrated for different multi-hop settings, for which classical DeepJSCC may suffer from noise accumulation, measured by the learned perceptual image patch similarity (LPIPS) metric.
Updated: 2025-10-08 10:38:24
标题: 多跳深度联合源信道编码与深度哈希蒸馏用于语义对齐图像检索
摘要: 我们考虑通过深度联合源-信道编码(DeepJSCC)在多跳加性白噪声(AWGN)信道上传输图像,通过训练具有预训练的深度哈希蒸馏(DHD)模块的DeepJSCC编码器-解码器对来对图像进行语义聚类,从而通过增强语义一致性促进面向安全的应用,并提高感知重建质量。我们训练DeepJSCC模块既减少均方误差(MSE)又最小化源图像和重建图像的DHD哈希之间的余弦距离。通过语义对齐带来的显著提高的感知质量在不同多跳设置下得到了说明,对于这些设置,传统的DeepJSCC可能会受到噪声积累的影响,该影响可通过学习的感知图像补丁相似度(LPIPS)指标来衡量。
更新时间: 2025-10-08 10:38:24
领域: cs.IT,cs.AI,cs.CR,cs.LG,math.IT
Towards Generalization of Graph Neural Networks for AC Optimal Power Flow
AC Optimal Power Flow (ACOPF) is computationally expensive for large-scale power systems, with conventional solvers requiring prohibitive solution times. Machine learning approaches offer computational speedups but struggle with scalability and topology adaptability without expensive retraining. To enable scalability across grid sizes and adaptability to topology changes, we propose a Hybrid Heterogeneous Message Passing Neural Network (HH-MPNN). HH-MPNN models buses, generators, loads, shunts, transmission lines and transformers as distinct node or edge types, combined with a scalable transformer model for handling long-range dependencies. On grids from 14 to 2,000 buses, HH-MPNN achieves less than 1% optimality gap on default topologies. Applied zero-shot to thousands of unseen topologies, HH-MPNN achieves less than 3% optimality gap despite training only on default topologies. Pre-training on smaller grids also improves results on a larger grid. Computational speedups reach 1,000x to 10,000x compared to interior point solvers. These results advance practical, generalizable machine learning for real-time power system operations.
Updated: 2025-10-08 10:28:46
标题: 朝着通用化的图神经网络在交流最优潮流中的推广
摘要: 交流电最优功率流(ACOPF)对于大规模电力系统来说计算成本高昂,传统求解器需要耗费昂贵的解决时间。机器学习方法提供了计算速度的提升,但在没有昂贵的重新训练的情况下,很难实现可扩展性和拓扑适应性。为了实现跨网络尺寸的可扩展性和适应拓扑变化,我们提出了一种混合异质消息传递神经网络(HH-MPNN)。HH-MPNN将母线、发电机、负载、隔离器、输电线路和变压器建模为不同的节点或边类型,结合可扩展的变压器模型来处理长距离依赖关系。在从14到2,000个母线的电网上,HH-MPNN在默认拓扑上实现了不到1%的最优性差距。应用于数千个未见拓扑的零样本,HH-MPNN在仅训练默认拓扑的情况下,实现了不到3%的最优性差距。在较小的电网上进行预训练也提高了在较大电网上的结果。与内点求解器相比,计算速度提高了1,000倍至10,000倍。这些结果推动了实时电力系统运营的实用性、可泛化的机器学习。
更新时间: 2025-10-08 10:28:46
领域: cs.LG,cs.AI
Flow Matching for Robust Simulation-Based Inference under Model Misspecification
Simulation-based inference (SBI) is transforming experimental sciences by enabling parameter estimation in complex non-linear models from simulated data. A persistent challenge, however, is model misspecification: simulators are only approximations of reality, and mismatches between simulated and real data can yield biased or overconfident posteriors. We address this issue by introducing Flow Matching Corrected Posterior Estimation (FMCPE), a framework that leverages the flow matching paradigm to refine simulation-trained posterior estimators using a small set of real calibration samples. Our approach proceeds in two stages: first, a posterior approximator is trained on abundant simulated data; second, flow matching transports its predictions toward the true posterior supported by real observations, without requiring explicit knowledge of the misspecification. This design enables FMCPE to combine the scalability of SBI with robustness to distributional shift. Across synthetic benchmarks and real-world datasets, we show that our proposal consistently mitigates the effects of misspecification, delivering improved inference accuracy and uncertainty calibration compared to standard SBI baselines, while remaining computationally efficient.
Updated: 2025-10-08 10:25:46
标题: 模型错误规范下的鲁棒基于仿真推断的流匹配
摘要: 基于模拟的推断(SBI)正在通过在模拟数据中实现复杂非线性模型的参数估计而改变实验科学。然而,一个持久的挑战是模型错误规定:模拟器只是现实的近似,模拟数据与真实数据之间的不匹配可能导致偏倚或过度自信的后验。我们通过引入流匹配校正后验估计(FMCPE)框架来解决这个问题,这个框架利用流匹配范式来利用一小组真实校准样本来改进经过模拟训练的后验估计器。我们的方法分为两个阶段:首先,在丰富的模拟数据上训练后验估计器;其次,流匹配将其预测向真实观测支持的真实后验方向传输,而无需明确了解错误规定。这种设计使得FMCPE能够将SBI的可扩展性与对分布变化的稳健性相结合。通过合成基准和真实世界数据集,我们展示了我们的提案始终减轻错误规定的影响,提供了更好的推断准确性和不确定性校准,而且保持计算效率。
更新时间: 2025-10-08 10:25:46
领域: stat.ML,cs.LG
Taxonomy, Opportunities, and Challenges of Representation Engineering for Large Language Models
Representation Engineering (RepE) is a novel paradigm for controlling the behavior of LLMs. Unlike traditional approaches that modify inputs or fine-tune the model, RepE directly manipulates the model's internal representations. As a result, it may offer more effective, interpretable, data-efficient, and flexible control over models' behavior. We present the first comprehensive survey of RepE for LLMs, reviewing the rapidly growing literature to address key questions: What RepE methods exist and how do they differ? For what concepts and problems has RepE been applied? What are the strengths and weaknesses of RepE compared to other methods? To answer these, we propose a unified framework describing RepE as a pipeline comprising representation identification, operationalization, and control. We posit that while RepE methods offer significant potential, challenges remain, including managing multiple concepts, ensuring reliability, and preserving models' performance. Towards improving RepE, we identify opportunities for experimental and methodological improvements and construct a guide for best practices.
Updated: 2025-10-08 10:19:34
标题: 分类学、大语言模型表示工程的机遇和挑战
摘要: Representation Engineering (RepE)是一种控制LLM行为的新范式。与修改输入或微调模型的传统方法不同,RepE直接操作模型的内部表示。因此,它可能提供更有效、可解释、数据效率高和灵活控制模型行为的方法。我们提出了第一个关于LLM的RepE的综合调查,回顾了迅速增长的文献,以解决关键问题:存在哪些RepE方法,它们有何不同?RepE已被应用于哪些概念和问题?与其他方法相比,RepE的优势和劣势是什么?为了回答这些问题,我们提出了一个描述RepE的统一框架,将其视为一个包括表示识别、操作化和控制的流程。我们认为,虽然RepE方法具有巨大潜力,但仍然存在挑战,包括管理多个概念、确保可靠性和保留模型性能。为了改进RepE,我们确定了实验和方法的改进机会,并制定了最佳实践指南。
更新时间: 2025-10-08 10:19:34
领域: cs.LG,cs.CL
P3D: Scalable Neural Surrogates for High-Resolution 3D Physics Simulations with Global Context
We present a scalable framework for learning deterministic and probabilistic neural surrogates for high-resolution 3D physics simulations. We introduce a hybrid CNN-Transformer backbone architecture targeted for 3D physics simulations, which significantly outperforms existing architectures in terms of speed and accuracy. Our proposed network can be pretrained on small patches of the simulation domain, which can be fused to obtain a global solution, optionally guided via a fast and scalable sequence-to-sequence model to include long-range dependencies. This setup allows for training large-scale models with reduced memory and compute requirements for high-resolution datasets. We evaluate our backbone architecture against a large set of baseline methods with the objective to simultaneously learn the dynamics of 14 different types of PDEs in 3D. We demonstrate how to scale our model to high-resolution isotropic turbulence with spatial resolutions of up to $512^3$. Finally, we demonstrate the versatility of our network by training it as a diffusion model to produce probabilistic samples of highly turbulent 3D channel flows across varying Reynolds numbers, accurately capturing the underlying flow statistics.
Updated: 2025-10-08 10:19:07
标题: P3D:具有全局上下文的高分辨率3D物理模拟的可扩展神经替代方案
摘要: 我们提出了一个可扩展的框架,用于学习确定性和概率性神经替代物,用于高分辨率3D物理模拟。我们引入了一个针对3D物理模拟的混合CNN-Transformer骨干架构,其在速度和准确性方面显著优于现有架构。我们提出的网络可以在模拟域的小块上进行预训练,这些小块可以融合以获得全局解决方案,可选择通过快速且可扩展的序列到序列模型进行引导以包括长距离依赖关系。这种设置允许为高分辨率数据集训练大规模模型,减少内存和计算需求。我们将我们的骨干架构与大量基准方法进行评估,目的是同时学习3D中14种不同类型的PDE动力学。我们演示了如何将我们的模型扩展到高分辨率各向同性湍流,空间分辨率高达$512^3$。最后,我们演示了我们网络的多功能性,通过将其训练为扩散模型,生成高度湍流的3D通道流的概率样本,跨不同的雷诺数,准确捕捉底层流统计数据。
更新时间: 2025-10-08 10:19:07
领域: cs.LG
Enhancing Bankruptcy Prediction of Banks through Advanced Machine Learning Techniques: An Innovative Approach and Analysis
Context: Financial system stability is determined by the condition of the banking system. A bank failure can destroy the stability of the financial system, as banks are subject to systemic risk, affecting not only individual banks but also segments or the entire financial system. Calculating the probability of a bank going bankrupt is one way to ensure the banking system is safe and sound. Existing literature and limitations: Statistical models, such as Altman's Z-Score, are one of the common techniques for developing a bankruptcy prediction model. However, statistical methods rely on rigid and sometimes irrelevant assumptions, which can result in low forecast accuracy. New approaches are necessary. Objective of the research: Bankruptcy models are developed using machine learning techniques, such as logistic regression (LR), random forest (RF), and support vector machines (SVM). According to several studies, machine learning is also more accurate and effective than statistical methods for categorising and forecasting banking risk management. Present Research: The commercial bank data are derived from the annual financial statements of 44 active banks and 21 bankrupt banks in Turkey from 1994 to 2004, and the rural bank data are derived from the quarterly financial reports of 43 active and 43 bankrupt rural banks in Indonesia between 2013 and 2019. Five rural banks in Indonesia have also been selected to demonstrate the feasibility of analysing bank bankruptcy trends. Findings and implications: The results of the research experiments show that RF can forecast data from commercial banks with a 90% accuracy rate. Furthermore, the three machine learning methods proposed accurately predict the likelihood of rural bank bankruptcy. Contribution and Conclusion: The proposed innovative machine learning approach help to implement policies that reduce the costs of bankruptcy.
Updated: 2025-10-08 10:16:10
标题: 通过先进的机器学习技术提升银行破产预测能力:一种创新方法和分析
摘要: 背景:金融体系的稳定性取决于银行系统的状况。银行倒闭可能破坏金融体系的稳定性,因为银行面临系统风险,不仅影响个别银行,还影响金融体系的某些部分或整个体系。计算银行破产的概率是确保银行系统安全稳健的一种方式。现有文献和限制:统计模型,如Altman的Z-Score,是发展破产预测模型的常见技术之一。然而,统计方法依赖刚性且有时不相关的假设,可能导致预测准确性低。需要新的方法。研究目标:使用机器学习技术,如逻辑回归(LR)、随机森林(RF)和支持向量机(SVM),开发破产模型。根据几项研究,机器学习也比统计方法更准确、更有效地分类和预测银行风险管理。当前研究:商业银行数据源自1994年至2004年土耳其44家活跃银行和21家破产银行的年度财务报表,而农村银行数据源自2013年至2019年印度尼西亚43家活跃和43家破产农村银行的季度财务报告。另外,还选择了印度尼西亚的五家农村银行,以展示分析银行破产趋势的可行性。结果和影响:研究实验结果表明,随机森林可以以90%的准确率预测商业银行的数据。此外,提出的三种机器学习方法准确地预测了农村银行破产的可能性。贡献和结论:提出的创新机器学习方法有助于实施降低破产成本的政策。
更新时间: 2025-10-08 10:16:10
领域: cs.LG,cs.AI
GRPO is Secretly a Process Reward Model
We prove theoretically that the GRPO RL algorithm induces a non-trivial process reward model (PRM), under certain assumptions regarding within-group overlap of token sequences across completions. We then show empirically that these assumptions are met under real-world conditions: GRPO does in fact induce a non-trivial PRM. Leveraging the framework of GRPO-as-a-PRM, we identify a flaw in the GRPO objective: non-uniformly distributed process steps hinder both exploration and exploitation (under different conditions). We propose a simple modification to the algorithm to mitigate this defect ($\lambda$-GRPO), and show that LLMs trained with $\lambda$-GRPO achieve higher validation accuracy and performance on downstream reasoning tasks$-$and reach peak performance more rapidly$-$than LLMs trained with standard GRPO. Our results call into question the advantage of costly, explicitly-defined PRMs for GRPO: we show that it is possible to instead leverage the hidden, built-in PRM structure within the vanilla GRPO algorithm to boost model performance with a negligible impact on training time and cost.
Updated: 2025-10-08 10:13:42
标题: GRPO是一个秘密的过程奖励模型
摘要: 我们在理论上证明了GRPO强化学习算法在某些关于组内令牌序列重叠的假设下会引发一个非平凡的过程奖励模型(PRM)。然后我们通过实证研究证明了这些假设在现实条件下是成立的:实际上,GRPO确实引发了一个非平凡的PRM。利用GRPO作为PRM的框架,我们发现了GRPO目标中的一个缺陷:非均匀分布的过程步骤会阻碍探索和开发(在不同条件下)。我们提出了一种简单的算法修改来减轻这个缺陷($\lambda$-GRPO),并展示了用$\lambda$-GRPO训练的LLMs在验证准确性和下游推理任务上的性能更高,并且达到峰值性能的速度更快,比用标准GRPO训练的LLMs表现更好。我们的结果质疑了为GRPO定义昂贵的显式PRM的优势:我们展示了可以利用基础的、隐藏的PRM结构来提升模型性能,而对训练时间和成本几乎没有影响。
更新时间: 2025-10-08 10:13:42
领域: cs.LG,cs.AI
Reconquering Bell sampling on qudits: stabilizer learning and testing, quantum pseudorandomness bounds, and more
Bell sampling is a simple yet powerful tool based on measuring two copies of a quantum state in the Bell basis, and has found applications in a plethora of problems related to stabiliser states and measures of magic. However, it was not known how to generalise the procedure from qubits to $d$-level systems -- qudits -- for all dimensions $d > 2$ in a useful way. Indeed, a prior work of the authors (arXiv'24) showed that the natural extension of Bell sampling to arbitrary dimensions fails to provide meaningful information about the quantum states being measured. In this paper, we overcome the difficulties encountered in previous works and develop a useful generalisation of Bell sampling to qudits of all $d\geq 2$. At the heart of our primitive is a new unitary, based on Lagrange's four-square theorem, that maps four copies of any stabiliser state $|\mathcal{S}\rangle$ to four copies of its complex conjugate $|\mathcal{S}^\ast\rangle$ (up to some Pauli operator), which may be of independent interest. We then demonstrate the utility of our new Bell sampling technique by lifting several known results from qubits to qudits for any $d\geq 2$: 1. Learning stabiliser states in $O(n^3)$ time with $O(n)$ samples; 2. Solving the Hidden Stabiliser Group Problem in $\tilde{O}(n^3/\varepsilon)$ time with $\tilde{O}(n/\varepsilon)$ samples; 3. Testing whether $|\psi\rangle$ has stabiliser size at least $d^t$ or is $\varepsilon$-far from all such states in $\tilde{O}(n^3/\varepsilon)$ time with $\tilde{O}(n/\varepsilon)$ samples; 4. Clifford circuits with at most $n/2$ single-qudit non-Clifford gates cannot prepare pseudorandom states; 5. Testing whether $|\psi\rangle$ has stabiliser fidelity at least $1-\varepsilon_1$ or at most $1-\varepsilon_2$ with $O(d^2/\varepsilon_2)$ samples if $\varepsilon_1 = 0$ or $O(d^2/\varepsilon_2^2)$ samples if $\varepsilon_1 = O(d^{-2})$.
Updated: 2025-10-08 10:13:16
标题: 重新夺回关于qudit的贝尔采样:稳定器学习和测试,量子伪随机性界限,以及更多
摘要: 贝尔采样是一种简单但强大的工具,基于在贝尔基础上测量量子态的两个副本,并在与稳定态和魔术度量相关的众多问题中找到应用。然而,如何将该过程从量子比特推广到所有维度$d > 2$ 的$d$级系统——qudits——以一种有用的方式,一直是未知的。事实上,作者之前的一项工作(arXiv'24)表明,将贝尔采样自然推广到任意维度无法提供关于被测量的量子态的有意义信息。在本文中,我们克服了先前工作中遇到的困难,并开发了一个有用的将贝尔采样推广到所有$d\geq 2$的qudits的方法。在我们的方法的核心是一个基于拉格朗日四平方定理的新酉矩阵,它将任意稳定态$|\mathcal{S}\rangle$的四个副本映射到其复共轭$|\mathcal{S}^\ast\rangle$的四个副本(可能会有一些Pauli算符),这可能是独立感兴趣的。然后,我们通过将几个已知结果从量子比特提升到所有$d\geq 2$的qudits来展示我们新的贝尔采样技术的实用性: 1. 在$O(n^3)$时间内用$O(n)$个样本学习稳定态; 2. 在$\tilde{O}(n^3/\varepsilon)$时间内用$\tilde{O}(n/\varepsilon)$个样本解决隐藏稳定群问题; 3. 在$\tilde{O}(n^3/\varepsilon)$时间内用$\tilde{O}(n/\varepsilon)$个样本测试$|\psi\rangle$的稳定器大小至少为$d^t$或者与所有这种状态$\varepsilon$远的状态; 4. 具有最多$n/2$个单qudit非Clifford门的Clifford电路不能准备伪随机状态; 5. 如果$\varepsilon_1 = 0$,则测试$|\psi\rangle$的稳定器保真度至少为$1-\varepsilon_1$或者至多为$1-\varepsilon_2$,如果$\varepsilon_1 = O(d^{-2})$,则用$O(d^2/\varepsilon_2)$个样本进行测试。
更新时间: 2025-10-08 10:13:16
领域: quant-ph,cs.CC,cs.DS,cs.LG
When Judgment Becomes Noise: How Design Failures in LLM Judge Benchmarks Silently Undermine Validity
LLM-judged benchmarks are increasingly used to evaluate complex model behaviors, yet their design introduces failure modes absent in conventional ground-truth based benchmarks. We argue that without tight objectives and verifiable constructions, benchmark rankings can produce high-confidence rankings that are in fact largely noise. We introduce two mechanisms to diagnose these issues. Schematic adherence quantifies how much of a judge's overall verdict is explained by the explicit evaluation schema, revealing unexplained variance when judges deviate from their own rubric. Psychometric validity aggregates internal consistency and discriminant validity signals to quantify irreducible uncertainty in any benchmarking run. Applying these tools to Arena-Hard Auto, we find severe schema incoherence and factor collapse across popular judges: for example, unexplained variance exceeding 90 percent for DeepSeek-R1-32B and factor correlations above 0.93 for most criteria. We also show that the ELO-style aggregation used by Arena-Hard Auto collapses and masks genuine ranking uncertainty. Our results highlight design failures that undermine validity and offer actionable principles for building better-scoped, reliability-aware LLM-judged benchmarks. We released our code and dataset at https://github.com/penfever/judgment-to-noise
Updated: 2025-10-08 10:11:46
标题: 当判断成为噪音:LLM法官标准中设计失误如何悄然破坏有效性
摘要: LLM-judged benchmarks越来越被用来评估复杂模型的行为,然而它们的设计引入了传统基于真实数据的基准中不存在的失效模式。我们认为,如果没有明确的目标和可验证的构造,基准排名可能会产生实际上主要是噪音的高可信度排名。我们引入了两种机制来诊断这些问题。图式依从度量化了评委整体裁决有多少是由明确的评估模式解释的,当评委偏离自己的评分标准时揭示了未解释的差异。心理测量学有效性聚合了内部一致性和辨别效度信号,以量化任何基准运行中不可减少的不确定性。将这些工具应用于Arena-Hard Auto,我们发现流行评委之间存在严重的图式不一致和因子崩溃:例如,DeepSeek-R1-32B的未解释差异超过90%,大多数标准的因子相关性超过0.93。我们还表明,Arena-Hard Auto使用的ELO风格聚合导致和掩盖了真正的排名不确定性。我们的结果突显了损害有效性的设计失败,并提供了建立更好范围、可靠性感知的LLM评委基准的可操作原则。我们在https://github.com/penfever/judgment-to-noise发布了我们的代码和数据集。
更新时间: 2025-10-08 10:11:46
领域: cs.LG,cs.AI
Quantum Rationale-Aware Graph Contrastive Learning for Jet Discrimination
In high-energy physics, particle jet tagging plays a pivotal role in distinguishing quark from gluon jets using data from collider experiments. While graph-based deep learning methods have advanced this task beyond traditional feature-engineered approaches, the complex data structure and limited labeled samples present ongoing challenges. However, existing contrastive learning (CL) frameworks struggle to leverage rationale-aware augmentations effectively, often lacking supervision signals that guide the extraction of salient features and facing computational efficiency issues such as high parameter counts. In this study, we demonstrate that integrating a quantum rationale generator (QRG) within our proposed Quantum Rationale-aware Graph Contrastive Learning (QRGCL) framework significantly enhances jet discrimination performance, reducing reliance on labeled data and capturing discriminative features. Evaluated on the quark-gluon jet dataset, QRGCL achieves an AUC score of $77.53\%$ while maintaining a compact architecture of only 45 QRG parameters, outperforming classical, quantum, and hybrid GCL and GNN benchmarks. These results highlight QRGCL's potential to advance jet tagging and other complex classification tasks in high-energy physics, where computational efficiency and feature extraction limitations persist.
Updated: 2025-10-08 10:09:02
标题: 量子理性感知图对比学习用于喷流辨别
摘要: 在高能物理学中,粒子喷注标记在利用来自对撞机实验的数据区分夸克喷注和胶子喷注方面发挥着关键作用。虽然基于图的深度学习方法已经将这一任务推进到了超越传统特征工程方法的水平,但复杂的数据结构和有限的标记样本仍然带来持续的挑战。然而,现有的对比学习(CL)框架往往难以有效利用理性感知增强,经常缺乏指导提取显著特征的监督信号,并面临高参数计数等计算效率问题。在本研究中,我们展示了将量子理性生成器(QRG)集成到我们提出的量子理性感知图对比学习(QRGCL)框架中,显著提高了喷注鉴别性能,减少对标记数据的依赖并捕获差异特征。在夸克-胶子喷注数据集上评估,QRGCL实现了$77.53\%$的AUC分数,仅使用45个QRG参数的紧凑架构,优于古典、量子和混合GCL和GNN基准。这些结果突显了QRGCL在推进高能物理学中的喷注标记和其他复杂分类任务方面的潜力,其中计算效率和特征提取限制仍然存在。
更新时间: 2025-10-08 10:09:02
领域: cs.LG,hep-ph
FedAGHN: Personalized Federated Learning with Attentive Graph HyperNetworks
Personalized Federated Learning (PFL) aims to address the statistical heterogeneity of data across clients by learning the personalized model for each client. Among various PFL approaches, the personalized aggregation-based approach conducts parameter aggregation in the server-side aggregation phase to generate personalized models, and focuses on learning appropriate collaborative relationships among clients for aggregation. However, the collaborative relationships vary in different scenarios and even at different stages of the FL process. To this end, we propose Personalized Federated Learning with Attentive Graph HyperNetworks (FedAGHN), which employs Attentive Graph HyperNetworks (AGHNs) to dynamically capture fine-grained collaborative relationships and generate client-specific personalized initial models. Specifically, AGHNs empower graphs to explicitly model the client-specific collaborative relationships, construct collaboration graphs, and introduce tunable attentive mechanism to derive the collaboration weights, so that the personalized initial models can be obtained by aggregating parameters over the collaboration graphs. Extensive experiments can demonstrate the superiority of FedAGHN. Moreover, a series of visualizations are presented to explore the effectiveness of collaboration graphs learned by FedAGHN.
Updated: 2025-10-08 10:08:45
标题: FedAGHN:具有关注图超网络的个性化联邦学习
摘要: 个性化联邦学习(PFL)旨在通过为每个客户端学习个性化模型来解决客户端数据的统计异质性。在各种PFL方法中,基于个性化聚合的方法在服务器端聚合阶段进行参数聚合以生成个性化模型,并侧重于学习适当的客户端之间的协作关系以进行聚合。然而,在不同场景甚至不同FL过程的不同阶段中,协作关系会有所变化。因此,我们提出了使用关注图超网络(AGHNs)的个性化联邦学习(FedAGHN),它利用AGHNs动态捕捉细粒度的协作关系,并生成客户端特定的个性化初始模型。具体来说,AGHNs使图能够明确地建模客户端特定的协作关系,构建协作图,并引入可调节的关注机制来推导协作权重,从而通过在协作图上聚合参数来获得个性化初始模型。大量实验证明了FedAGHN的优越性。此外,我们提供了一系列可视化图表来探索FedAGHN学习到的协作图的有效性。
更新时间: 2025-10-08 10:08:45
领域: cs.LG,cs.AI
CNN-TFT explained by SHAP with multi-head attention weights for time series forecasting
Convolutional neural networks (CNNs) and transformer architectures offer strengths for modeling temporal data: CNNs excel at capturing local patterns and translational invariances, while transformers effectively model long-range dependencies via self-attention. This paper proposes a hybrid architecture integrating convolutional feature extraction with a temporal fusion transformer (TFT) backbone to enhance multivariate time series forecasting. The CNN module first applies a hierarchy of one-dimensional convolutional layers to distill salient local patterns from raw input sequences, reducing noise and dimensionality. The resulting feature maps are then fed into the TFT, which applies multi-head attention to capture both short- and long-term dependencies and to weigh relevant covariates adaptively. We evaluate the CNN-TFT on a hydroelectric natural flow time series dataset. Experimental results demonstrate that CNN-TFT outperforms well-established deep learning models, with a mean absolute percentage error of up to 2.2%. The explainability of the model is obtained by a proposed Shapley additive explanations with multi-head attention weights (SHAP-MHAW). Our novel architecture, named CNN-TFT-SHAP-MHAW, is promising for applications requiring high-fidelity, multivariate time series forecasts, being available for future analysis at https://github.com/SFStefenon/CNN-TFT-SHAP-MHAW .
Updated: 2025-10-08 10:08:28
标题: 用SHAP解释的多头注意力权重的CNN-TFT在时间序列预测中的应用
摘要: 卷积神经网络(CNN)和变压器架构在建模时间数据方面具有优势:CNN擅长捕捉局部模式和平移不变性,而变压器通过自注意力有效地建模长距离依赖关系。本文提出了一种混合架构,将卷积特征提取与时间融合变压器(TFT)骨干相结合,以增强多变量时间序列预测。CNN模块首先应用一系列一维卷积层的层次结构,从原始输入序列中提取显著的局部模式,降低噪声和维度。然后将得到的特征映射输入TFT,TFT应用多头注意力来捕捉短期和长期依赖关系,并自适应地权衡相关协变量。我们将CNN-TFT应用于一个水力发电自然流量时间序列数据集。实验结果表明,CNN-TFT优于成熟的深度学习模型,平均绝对百分比误差高达2.2%。该模型的可解释性是通过提出的带有多头注意力权重(SHAP-MHAW)的Shapley可加解释获得的。我们的新颖架构,命名为CNN-TFT-SHAP-MHAW,适用于需要高保真度、多变量时间序列预测的应用,并可在https://github.com/SFStefenon/CNN-TFT-SHAP-MHAW 进行未来分析。
更新时间: 2025-10-08 10:08:28
领域: cs.LG,cs.AI
The Alignment Auditor: A Bayesian Framework for Verifying and Refining LLM Objectives
The objectives that Large Language Models (LLMs) implicitly optimize remain dangerously opaque, making trustworthy alignment and auditing a grand challenge. While Inverse Reinforcement Learning (IRL) can infer reward functions from behaviour, existing approaches either produce a single, overconfident reward estimate or fail to address the fundamental ambiguity of the task (non-identifiability). This paper introduces a principled auditing framework that re-frames reward inference from a simple estimation task to a comprehensive process for verification. Our framework leverages Bayesian IRL to not only recover a distribution over objectives but to enable three critical audit capabilities: (i) Quantifying and systematically reducing non-identifiability by demonstrating posterior contraction over sequential rounds of evidence; (ii) Providing actionable, uncertainty-aware diagnostics that expose spurious shortcuts and identify out-of-distribution prompts where the inferred objective cannot be trusted; and (iii) Validating policy-level utility by showing that the refined, low-uncertainty reward can be used directly in RLHF to achieve training dynamics and toxicity reductions comparable to the ground-truth alignment process. Empirically, our framework successfully audits a detoxified LLM, yielding a well-calibrated and interpretable objective that strengthens alignment guarantees. Overall, this work provides a practical toolkit for auditors, safety teams, and regulators to verify what LLMs are truly trying to achieve, moving us toward more trustworthy and accountable AI.
Updated: 2025-10-08 10:07:14
标题: 《校准审计员:用于验证和细化LLM目标的贝叶斯框架》
摘要: 大型语言模型(LLM)隐含优化的目标仍然是危险地不透明的,使得可信的对齐和审计成为一个巨大的挑战。虽然逆强化学习(IRL)可以从行为中推断奖励函数,但现有方法要么产生一个单一、过于自信的奖励估计,要么未能解决任务的基本模糊性(非可辨识性)。本文介绍了一个基于原则的审计框架,将奖励推断从简单的估计任务转变为一个全面的验证过程。我们的框架利用贝叶斯IRL不仅恢复目标的分布,而且实现了三个关键的审计能力:(i)通过展示证据的序贯轮次上的后验收缩来量化和系统地减少非可辨识性;(ii)提供可行的、关于不确定性的诊断,揭示虚假的快捷方式并识别推断目标不能被信任的分布之外的提示;以及(iii)通过显示经过精心调整的、低不确定性的奖励可以直接在RLHF中使用,以实现与基准对齐过程相当的训练动态和毒性减少,从而验证策略级效用。在经验上,我们的框架成功审计了一个经过排毒的LLM,产生了一个经过良好校准和可解释的目标,加强了对齐保证。总的来说,这项工作为审计员、安全团队和监管机构提供了一个实用的工具包,用于验证LLM真正试图实现的目标,使我们朝着更加可信赖和负责任的人工智能迈进。
更新时间: 2025-10-08 10:07:14
领域: cs.LG,cs.CL
AC-LoRA: (Almost) Training-Free Access Control-Aware Multi-Modal LLMs
Corporate LLMs are gaining traction for efficient knowledge dissemination and management within organizations. However, as current LLMs are vulnerable to leaking sensitive information, it has proven difficult to apply them in settings where strict access control is necessary. To this end, we design AC-LoRA, an end-to-end system for access control-aware corporate LLM chatbots that maintains a strong information isolation guarantee. AC-LoRA maintains separate LoRA adapters for permissioned datasets, along with the document embedding they are finetuned on. AC-LoRA retrieves a precise set of LoRA adapters based on the similarity score with the user query and their permission. This similarity score is later used to merge the responses if more than one LoRA is retrieved, without requiring any additional training for LoRA routing. We provide an end-to-end prototype of AC-LoRA, evaluate it on two datasets, and show that AC-LoRA matches or even exceeds the performance of state-of-the-art LoRA mixing techniques while providing strong isolation guarantees. Furthermore, we show that AC-LoRA design can be directly applied to different modalities.
Updated: 2025-10-08 10:01:30
标题: AC-LoRA:(几乎)无需训练的访问控制感知多模态LLMs
摘要: 企业LLMs正在获得越来越多的关注,以实现组织内部的有效知识传播和管理。然而,由于当前的LLMs容易泄露敏感信息,在需要严格访问控制的环境中应用它们已被证明是困难的。为此,我们设计了AC-LoRA,这是一个面向访问控制意识的企业LLM聊天机器人的端到端系统,可以保证强大的信息隔离。AC-LoRA为许可数据集维护单独的LoRA适配器,以及它们进行微调的文档嵌入。AC-LoRA基于与用户查询的相似度得分和他们的权限,检索出精确的LoRA适配器集合。如果检索到多个LoRA,则后续会使用这个相似度得分合并响应,而无需为LoRA路由进行任何额外的训练。我们提供了AC-LoRA的端到端原型,对两个数据集进行了评估,并展示了AC-LoRA与最先进的LoRA混合技术表现相匹配甚至超越的性能,同时提供了强大的隔离保证。此外,我们展示了AC-LoRA设计可以直接应用于不同的形式。
更新时间: 2025-10-08 10:01:30
领域: cs.CR,cs.AI
Vectorized FlashAttention with Low-cost Exponential Computation in RISC-V Vector Processors
Attention is a core operation in numerous machine learning and artificial intelligence models. This work focuses on the acceleration of attention kernel using FlashAttention algorithm, in vector processors, particularly those based on the RISC-V instruction set architecture (ISA). This work represents the first effort to vectorize FlashAttention, minimizing scalar code and simplifying the computational complexity of evaluating exponentials needed by softmax used in attention. By utilizing a low-cost approximation for exponentials in floating-point arithmetic, we reduce the cost of computing the exponential function without the need to extend baseline vector ISA with new custom instructions. Also, appropriate tiling strategies are explored with the goal to improve memory locality. Experimental results highlight the scalability of our approach, demonstrating significant performance gains with the vectorized implementations when processing attention layers in practical applications.
Updated: 2025-10-08 09:55:32
标题: 在RISC-V向量处理器中具有低成本指数计算的矢量化FlashAttention
摘要: 注意力是许多机器学习和人工智能模型中的核心操作。本文关注使用FlashAttention算法在矢量处理器上加速注意力核心操作,特别是基于RISC-V指令集架构(ISA)的处理器。本文代表了第一次尝试对FlashAttention进行矢量化,最小化标量代码并简化用于注意力中的softmax所需的指数计算复杂性。通过在浮点运算中使用低成本的指数近似,我们降低了计算指数函数的成本,而无需通过新的自定义指令扩展基线矢量ISA。此外,本文探讨了适当的切片策略,旨在改善内存局部性。实验结果突出显示了我们方法的可扩展性,在处理实际应用中的注意力层时,通过矢量化实现取得了显著的性能提升。
更新时间: 2025-10-08 09:55:32
领域: cs.LG,cs.DC,cs.PF
Early wind turbine alarm prediction based on machine learning: AlarmForecasting
Alarm data is pivotal in curbing fault behavior in Wind Turbines (WTs) and forms the backbone for advancedpredictive monitoring systems. Traditionally, research cohorts have been confined to utilizing alarm data solelyas a diagnostic tool, merely indicative of unhealthy status. However, this study aims to offer a transformativeleap towards preempting alarms, preventing alarms from triggering altogether, and consequently avertingimpending failures. Our proposed Alarm Forecasting and Classification (AFC) framework is designed on twosuccessive modules: first, the regression module based on long short-term memory (LSTM) for time-series alarmforecasting, and thereafter, the classification module to implement alarm tagging on the forecasted alarm. Thisway, the entire alarm taxonomy can be forecasted reliably rather than a few specific alarms. 14 Senvion MM82turbines with an operational period of 5 years are used as a case study; the results demonstrated 82%, 52%,and 41% accurate forecasts for 10, 20, and 30 min alarm forecasts, respectively. The results substantiateanticipating and averting alarms, which is significant in curbing alarm frequency and enhancing operationalefficiency through proactive intervention.
Updated: 2025-10-08 09:53:49
标题: 基于机器学习的早期风力涡轮机警报预测:AlarmForecasting
摘要: 报警数据在遏制风力涡轮机(WTs)的故障行为中起着关键作用,并且构成了先进预测监测系统的基础。传统上,研究团体一直局限于仅将报警数据用作诊断工具,仅表明状态不健康。然而,本研究旨在实现向预防报警的转变,防止报警完全触发,进而避免即将发生的故障。我们提出的报警预测和分类(AFC)框架设计了两个连续模块:首先是基于长短期记忆(LSTM)的时间序列报警预测的回归模块,然后是分类模块,用于对预测的报警进行标记。这样,整个报警分类可以可靠地预测,而不是仅仅是一些特定的报警。以14台Senvion MM82风力涡轮机作为案例研究;结果表明对于10、20和30分钟的报警预测,准确率分别为82%、52%和41%。结果证实了对报警的预期和避免,这在遏制报警频率并通过积极干预提高运营效率方面具有重要意义。
更新时间: 2025-10-08 09:53:49
领域: cs.LG,physics.app-ph
Recurrence-Complete Frame-based Action Models
In recent years, attention-like mechanisms have been used to great success in the space of large language models, unlocking scaling potential to a previously unthinkable extent. "Attention Is All You Need" famously claims RNN cells are not needed in conjunction with attention. We challenge this view. In this paper, we point to existing proofs that architectures with fully parallelizable forward or backward passes cannot represent classes of problems specifically interesting for long-running agentic tasks. We further conjecture a critical time t beyond which non-recurrence-complete models fail to aggregate inputs correctly, with concrete implications for agentic systems (e.g., software engineering agents). To address this, we introduce a recurrence-complete architecture and train it on GitHub-derived action sequences. Loss follows a power law in the trained sequence length while the parameter count remains fixed. Moreover, longer-sequence training always amortizes its linearly increasing wall-time cost, yielding lower loss as a function of wall time.
Updated: 2025-10-08 09:50:41
标题: 基于帧的行为模型的重复性-完整性
摘要: 近年来,类似于注意力机制的方法在大型语言模型领域取得了巨大成功,将扩展潜力提升到了以前无法想象的程度。“注意力就是一切”这一著名论断声称在注意力的帮助下,不需要RNN单元。我们对这一观点提出质疑。在本文中,我们指出已有证据证明具有完全可并行前向或后向传递的体系结构无法表示长时间运行的主体任务中特别有趣的问题类别。我们进一步推测了一个关键的时间t,超过这个时间非循环完整模型将无法正确地聚合输入,这对主体系统(例如软件工程代理)有具体的影响。为了解决这个问题,我们引入了一个循环完整的体系结构,并对其进行了GitHub衍生的行动序列训练。损失随训练序列长度遵循幂律增长,而参数数量保持不变。此外,更长序列的训练总是能够摊销其线性增长的墙上时间成本,从而作为墙上时间函数的损失降低。
更新时间: 2025-10-08 09:50:41
领域: cs.LG,cs.AI
Efficient numeracy in language models through single-token number embeddings
To drive progress in science and engineering, large language models (LLMs) must be able to process large amounts of numerical data and solve long calculations efficiently. This is currently only possible through the use of external tools or extensive reasoning chains, either limiting the numerical intuition of LLMs or limiting the length of problems they can solve. We show that frontier LLMs require excessive amounts of reasoning tokens to solve even basic calculations, which is exacerbated by their tokenization strategies that split single numbers into multiple tokens. This motivates the need for efficient and effective single-token number encodings. We introduce a set of desiderata for such encodings and show that existing approaches fail to fulfill them. To address these shortcomings, we propose BitTokens, a novel tokenization strategy that embeds any number into a single token using its IEEE 754 binary floating-point representation. Through extensive experiments we show that our BitTokens allow even small language models to learn algorithms that solve basic arithmetic operations nearly perfectly. This newly gained efficiency could expand the length and complexity of problems language models can solve.
Updated: 2025-10-08 09:48:11
标题: 通过单词数字嵌入实现语言模型中的高效数字计算
摘要: 为了推动科学和工程的进步,大型语言模型(LLMs)必须能够处理大量的数值数据并高效地解决长时间的计算。目前,这仅能通过使用外部工具或大量推理链来实现,这两种方法都会限制LLMs的数值直觉或限制它们能够解决的问题长度。我们表明,前沿的LLMs甚至需要过多的推理令牌才能解决基本的计算问题,这一问题被它们的标记化策略所恶化,这种策略会将单个数字拆分为多个令牌。这促使了对高效和有效的单令牌数字编码的需求。我们提出了一系列这种编码的期望,并展示现有方法未能实现这些期望。为了解决这些缺陷,我们提出了BitTokens,一种新颖的标记化策略,它使用其IEEE 754二进制浮点表示将任何数字嵌入到单个令牌中。通过大量实验,我们展示了我们的BitTokens甚至可以让小型语言模型学习几乎完美解决基本算术操作的算法。这种新获得的效率可能会扩展语言模型能够解决的问题长度和复杂度。
更新时间: 2025-10-08 09:48:11
领域: cs.LG
Exposing Citation Vulnerabilities in Generative Engines
We analyze answers generated by generative engines (GEs) from the perspectives of citation publishers and the content-injection barrier, defined as the difficulty for attackers to manipulate answers to user prompts by placing malicious content on the web. GEs integrate two functions: web search and answer generation that cites web pages using large language models. Because anyone can publish information on the web, GEs are vulnerable to poisoning attacks. Existing studies of citation evaluation focus on how faithfully answer content reflects cited sources, leaving unexamined which web sources should be selected as citations to defend against poisoning attacks. To fill this gap, we introduce evaluation criteria that assess poisoning threats using the citation information contained in answers. Our criteria classify the publisher attributes of citations to estimate the content-injection barrier thereby revealing the threat of poisoning attacks in current GEs. We conduct experiments in political domains in Japan and the United States (U.S.) using our criteria and show that citations from official party websites (primary sources) are approximately \(25\%\)--\(45\%\) in the U.S. and \(60\%\)--\(65\%\) in Japan, indicating that U.S. political answers are at higher risk of poisoning attacks. We also find that sources with low content-injection barriers are frequently cited yet are poorly reflected in answer content. To mitigate this threat, we discuss how publishers of primary sources can increase exposure of their web content in answers and show that well-known techniques are limited by language differences.
Updated: 2025-10-08 09:47:48
标题: 揭示生成引擎中的引用漏洞
摘要: 我们从引文出版商和内容注入障碍的角度分析由生成引擎(GEs)生成的答案,内容注入障碍被定义为攻击者通过在网页上放置恶意内容来操纵用户提示的答案的困难程度。GEs集成了两个功能:网络搜索和引文生成,引用了使用大型语言模型的网页。由于任何人都可以在网上发布信息,GEs易受污染攻击。现有的引文评估研究侧重于答案内容如何忠实地反映引用的来源,未考虑应该选择哪些网页来源作为引文以防止污染攻击。为了填补这一空白,我们引入了评估标准,根据答案中包含的引文信息评估毒害威胁。我们的标准分类了引文的出版商属性,以估计内容注入障碍,从而揭示了当前GEs中污染攻击的威胁。我们在日本和美国的政治领域进行实验,使用我们的标准,并显示来自官方政党网站(主要来源)的引文在美国约为25%至45%,在日本为60%至65%,表明美国政治答案更容易受到污染攻击。我们还发现,内容注入障碍较低的来源经常被引用,但在答案内容中反映不佳。为了减轻这一威胁,我们讨论了主要来源的出版商如何增加其网页内容在答案中的曝光,并显示出语言差异限制了众所周知的技术。
更新时间: 2025-10-08 09:47:48
领域: cs.CR,cs.CL,cs.IR
Efficient Discriminative Joint Encoders for Large Scale Vision-Language Reranking
Multimodal retrieval still leans on embedding-based models like CLIP for fast vector search over pre-computed image embeddings. Yet, unlike text retrieval, where joint-encoder rerankers are standard, comparable vision--language rerankers are largely absent. We find that seminal joint encoders such as BLIP are severely bottlenecked by an expensive visual feature-extraction stage, preventing practical deployment at scale. Motivated by this bottleneck, we introduce EDJE, an Efficient Discriminative Joint Encoder that precomputes vision tokens offline and compresses them via a lightweight attention-based adapter, so online inference runs only a compact joint encoder over a small set of visual tokens plus the text. EDJE preserves strong retrieval performance while drastically reducing storage and online compute, enabling high-throughput inference. Specifically, EDJE processes 50k image--text pairs/second while requiring 49kB of disk storage per image, matching prior art on Flickr (zero-shot) and COCO (fine-tuned) retrieval. The implementation and checkpoints will be made publicly available shortly.
Updated: 2025-10-08 09:46:09
标题: 高效的判别式大规模视觉-语言重新排序联合编码器
摘要: 多模态检索仍然依赖基于嵌入的模型,例如CLIP,用于快速搜索预先计算的图像嵌入向量。然而,与文本检索不同,联合编码器重排器是标准的,可比较的视觉-语言重排器却几乎不存在。我们发现,像BLIP这样的开创性联合编码器受限于昂贵的视觉特征提取阶段,阻碍了在规模上的实际部署。受到这一瓶颈的启发,我们引入了EDJE,一种高效的判别式联合编码器,它离线预计算视觉令牌,并通过轻量级基于注意力的适配器进行压缩,因此在线推断只需对一小组视觉令牌和文本运行一个紧凑的联合编码器。EDJE在大大减少存储和在线计算的同时保持了强大的检索性能,从而实现了高吞吐量推断。具体地,EDJE每秒处理50k个图像-文本对,每个图像只需49kB的磁盘存储,与Flickr(零样本)和COCO(微调)检索的先前技术相匹配。实现和检查点将很快公开提供。
更新时间: 2025-10-08 09:46:09
领域: cs.CV,cs.LG
The Unreasonable Effectiveness of Randomized Representations in Online Continual Graph Learning
Catastrophic forgetting is one of the main obstacles for Online Continual Graph Learning (OCGL), where nodes arrive one by one, distribution drifts may occur at any time and offline training on task-specific subgraphs is not feasible. In this work, we explore a surprisingly simple yet highly effective approach for OCGL: we use a fixed, randomly initialized encoder to generate robust and expressive node embeddings by aggregating neighborhood information, training online only a lightweight classifier. By freezing the encoder, we eliminate drifts of the representation parameters, a key source of forgetting, obtaining embeddings that are both expressive and stable. When evaluated across several OCGL benchmarks, despite its simplicity and lack of memory buffer, this approach yields consistent gains over state-of-the-art methods, with surprising improvements of up to 30% and performance often approaching that of the joint offline-training upper bound. These results suggest that in OCGL, catastrophic forgetting can be minimized without complex replay or regularization by embracing architectural simplicity and stability.
Updated: 2025-10-08 09:44:14
标题: 在线连续图学习中随机表示的不合理有效性
摘要: 灾难性遗忘是在线连续图学习(OCGL)的主要障碍之一,其中节点逐个到达,分布漂移可能随时发生,而在任务特定子图上进行离线训练是不可行的。在这项工作中,我们探索了一种令人惊讶的简单但非常有效的OCGL方法:我们使用一个固定的、随机初始化的编码器通过聚合邻域信息生成稳健且富有表现力的节点嵌入,仅在线训练一个轻量级分类器。通过冻结编码器,我们消除了表示参数漂移,这是遗忘的一个关键来源,获得了既富有表现力又稳定的嵌入。在多个OCGL基准测试中评估时,尽管这种方法简单且缺乏内存缓冲区,但却比最先进的方法产生了一致的增益,令人惊讶的改进高达30%,性能常常接近联合离线训练上限。这些结果表明,在OCGL中,可以通过拥抱建筑简单性和稳定性来最小化灾难性遗忘,而无需复杂的重播或正则化。
更新时间: 2025-10-08 09:44:14
领域: cs.LG
BlackboxNLP-2025 MIB Shared Task: Exploring Ensemble Strategies for Circuit Localization Methods
The Circuit Localization track of the Mechanistic Interpretability Benchmark (MIB) evaluates methods for localizing circuits within large language models (LLMs), i.e., subnetworks responsible for specific task behaviors. In this work, we investigate whether ensembling two or more circuit localization methods can improve performance. We explore two variants: parallel and sequential ensembling. In parallel ensembling, we combine attribution scores assigned to each edge by different methods-e.g., by averaging or taking the minimum or maximum value. In the sequential ensemble, we use edge attribution scores obtained via EAP-IG as a warm start for a more expensive but more precise circuit identification method, namely edge pruning. We observe that both approaches yield notable gains on the benchmark metrics, leading to a more precise circuit identification approach. Finally, we find that taking a parallel ensemble over various methods, including the sequential ensemble, achieves the best results. We evaluate our approach in the BlackboxNLP 2025 MIB Shared Task, comparing ensemble scores to official baselines across multiple model-task combinations.
Updated: 2025-10-08 09:39:40
标题: BlackboxNLP-2025 MIB共享任务:探索电路定位方法的集成策略
摘要: 《机制可解释性基准测试(MIB)的电路定位跟踪》评估了在大型语言模型(LLMs)中定位电路的方法,即负责特定任务行为的子网络。在这项工作中,我们调查了合并两种或更多电路定位方法是否可以提高性能。我们探索了两种变体:并行合并和顺序合并。在并行合并中,我们将不同方法分配给每个边的归因分数组合起来,例如通过平均值或取最小值或最大值。在顺序合并中,我们使用通过EAP-IG获得的边缘归因分数作为更昂贵但更精确的电路识别方法的热启动,即边缘修剪。我们观察到这两种方法都在基准测试指标上取得了显着收益,从而实现了更精确的电路识别方法。最后,我们发现,对包括顺序合并在内的各种方法进行并行合并可以取得最佳结果。我们在BlackboxNLP 2025 MIB共享任务中评估了我们的方法,比较了合奏分数与多个模型-任务组合的官方基线。
更新时间: 2025-10-08 09:39:40
领域: cs.CL,cs.LG
Unlocking Dataset Distillation with Diffusion Models
Dataset distillation seeks to condense datasets into smaller but highly representative synthetic samples. While diffusion models now lead all generative benchmarks, current distillation methods avoid them and rely instead on GANs or autoencoders, or, at best, sampling from a fixed diffusion prior. This trend arises because naive backpropagation through the long denoising chain leads to vanishing gradients, which prevents effective synthetic sample optimization. To address this limitation, we introduce Latent Dataset Distillation with Diffusion Models (LD3M), the first method to learn gradient-based distilled latents and class embeddings end-to-end through a pre-trained latent diffusion model. A linearly decaying skip connection, injected from the initial noisy state into every reverse step, preserves the gradient signal across dozens of timesteps without requiring diffusion weight fine-tuning. Across multiple ImageNet subsets at 128x128 and 256x256, LD3M improves downstream accuracy by up to 4.8 percentage points (1 IPC) and 4.2 points (10 IPC) over the prior state-of-the-art. The code for LD3M is provided at https://github.com/Brian-Moser/prune_and_distill.
Updated: 2025-10-08 09:38:25
标题: 使用扩散模型解锁数据集精炼
摘要: 数据集提炼旨在将数据集压缩为更小但高度代表性的合成样本。尽管扩散模型现在领先于所有生成基准,但当前的提炼方法避免使用它们,而是依赖于GAN或自动编码器,或者最好的情况下,从固定扩散先验中进行采样。这种趋势是因为通过长时间去噪链的朴素反向传播导致梯度消失,从而阻止有效的合成样本优化。为了解决这一限制,我们引入了具有扩散模型的潜在数据集提炼(LD3M),这是第一种通过预训练的潜在扩散模型端到端学习基于梯度的提炼潜在和类嵌入的方法。线性衰减的跳跃连接,从初始嘈杂状态注入到每个反向步骤中,保留了在不需要扩散权重微调的情况下跨数十个时间步的梯度信号。在多个128x128和256x256的ImageNet子集中,LD3M相比先前的最先进技术提高了高达4.8个百分点(1 IPC)和4.2个百分点(10 IPC)的下游准确性。LD3M的代码可在https://github.com/Brian-Moser/prune_and_distill找到。
更新时间: 2025-10-08 09:38:25
领域: cs.CV,cs.AI,cs.LG
Progressive Gaussian Transformer with Anisotropy-aware Sampling for Open Vocabulary Occupancy Prediction
The 3D occupancy prediction task has witnessed remarkable progress in recent years, playing a crucial role in vision-based autonomous driving systems. While traditional methods are limited to fixed semantic categories, recent approaches have moved towards predicting text-aligned features to enable open-vocabulary text queries in real-world scenes. However, there exists a trade-off in text-aligned scene modeling: sparse Gaussian representation struggles to capture small objects in the scene, while dense representation incurs significant computational overhead. To address these limitations, we present PG-Occ, an innovative Progressive Gaussian Transformer Framework that enables open-vocabulary 3D occupancy prediction. Our framework employs progressive online densification, a feed-forward strategy that gradually enhances the 3D Gaussian representation to capture fine-grained scene details. By iteratively enhancing the representation, the framework achieves increasingly precise and detailed scene understanding. Another key contribution is the introduction of an anisotropy-aware sampling strategy with spatio-temporal fusion, which adaptively assigns receptive fields to Gaussians at different scales and stages, enabling more effective feature aggregation and richer scene information capture. Through extensive evaluations, we demonstrate that PG-Occ achieves state-of-the-art performance with a relative 14.3% mIoU improvement over the previous best performing method. Code and pretrained models will be released upon publication on our project page: https://yanchi-3dv.github.io/PG-Occ
Updated: 2025-10-08 09:34:48
标题: 具有各向异性感知采样的渐进高斯变换器用于开放词汇占用预测
摘要: 近年来,3D占用预测任务取得了显著进展,在基于视觉的自动驾驶系统中发挥着关键作用。传统方法局限于固定的语义类别,而最近的方法已经开始向预测与文本对齐的特征转变,以实现在现实场景中进行开放词汇的文本查询。然而,在与文本对齐的场景建模中存在一个折衷:稀疏的高斯表示难以捕捉场景中的小物体,而密集表示会带来显著的计算开销。为了解决这些限制,我们提出了PG-Occ,一种创新的渐进高斯变换器框架,实现了开放词汇的3D占用预测。我们的框架采用了渐进在线致密化的前馈策略,逐步增强3D高斯表示以捕捉细粒度的场景细节。通过迭代增强表示,该框架实现了越来越精确和详细的场景理解。另一个关键贡献是引入了一种具有时空融合的各向异性感知采样策略,自适应地将感受野分配给不同尺度和阶段的高斯,实现更有效的特征聚合和更丰富的场景信息捕捉。通过大量评估,我们展示了PG-Occ相对于先前表现最佳方法的mIoU相对提高了14.3%的最新性能。代码和预训练模型将在我们的项目页面发布时发布:https://yanchi-3dv.github.io/PG-Occ
更新时间: 2025-10-08 09:34:48
领域: cs.CV,cs.AI
ExLLM: Experience-Enhanced LLM Optimization for Molecular Design and Beyond
Molecular design involves an enormous and irregular search space, where traditional optimizers such as Bayesian optimization, genetic algorithms, and generative models struggle to leverage expert knowledge or handle complex feedback. Recently, LLMs have been used as optimizers, achieving promising results on benchmarks such as PMO. However, existing approaches rely only on prompting or extra training, without mechanisms to handle complex feedback or maintain scalable memory. In particular, the common practice of appending or summarizing experiences at every query leads to redundancy, degraded exploration, and ultimately poor final outcomes under large-scale iterative search. We introduce ExLLM (Experience-Enhanced LLM optimization), an LLM-as-optimizer framework with three components: (1) a compact, evolving experience snippet tailored to large discrete spaces that distills non-redundant cues and improves convergence at low cost; (2) a simple yet effective k-offspring scheme that widens exploration per call and reduces orchestration cost; and (3) a lightweight feedback adapter that normalizes objectives for selection while formatting constraints and expert hints for iteration. ExLLM sets new state-of-the-art results on PMO and generalizes strongly in our setup, it sets records on circle packing and stellarator design, and yields consistent gains across additional domains requiring only a task-description template and evaluation functions to transfer.
Updated: 2025-10-08 09:32:42
标题: ExLLM:经验增强的LLM优化用于分子设计及其它领域
摘要: 分子设计涉及一个巨大且不规则的搜索空间,传统的优化器如贝叶斯优化、遗传算法和生成模型很难利用专家知识或处理复杂的反馈。最近,LLMs被用作优化器,在PMO等基准测试中取得了令人期待的结果。然而,现有方法仅依赖提示或额外训练,缺乏处理复杂反馈或保持可扩展内存的机制。特别是,在每个查询时追加或总结经验的常见做法会导致冗余、探索能力下降,并最终在大规模迭代搜索下产生不佳的最终结果。我们引入了ExLLM(经验增强LLM优化),这是一个LLM作为优化器框架,包括三个组成部分:(1)一个针对大离散空间量身定制的紧凑、不断演变的经验片段,提炼非冗余线索并以低成本改善收敛性;(2)一个简单而有效的k-后代方案,扩大每次调用的探索范围,并减少编排成本;(3)一个轻量级的反馈适配器,为选择标准化目标,同时为迭代格式化约束和专家提示。ExLLM在PMO上取得了新的最新成果,并在我们的设置中具有很强的泛化能力,它在圆形填充和星际磁约束器设计上创下了记录,并在需要仅使用任务描述模板和评估函数进行转移的其他领域中产生了一致的收益。
更新时间: 2025-10-08 09:32:42
领域: cs.LG
Quantum Computing Methods for Malware Detection
In this paper, we explore the potential of quantum computing in enhancing malware detection through the application of Quantum Machine Learning (QML). Our main objective is to investigate the performance of the Quantum Support Vector Machine (QSVM) algorithm compared to SVM. A publicly available dataset containing raw binaries of Portable Executable (PE) files was used for the classification. The QSVM algorithm, incorporating quantum kernels through different feature maps, was implemented and evaluated on a local simulator within the Qiskit SDK and IBM quantum computers. Experimental results from simulators and quantum hardware provide insights into the behavior and performance of quantum computers, especially in handling large-scale computations for malware detection tasks. The work summarizes the practical experience with using quantum hardware via the Qiskit interfaces. We describe in detail the critical issues encountered, as well as the fixes that had to be developed and applied to the base code of the Qiskit Machine Learning library. These issues include missing transpilation of the circuits submitted to IBM Quantum systems and exceeding the maximum job size limit due to the submission of all the circuits in one job.
Updated: 2025-10-08 09:31:31
标题: 量子计算方法用于恶意软件检测
摘要: 在本文中,我们探讨了量子计算在通过量子机器学习(QML)应用增强恶意软件检测的潜力。我们的主要目标是研究量子支持向量机(QSVM)算法与支持向量机(SVM)的性能。我们使用包含可执行文件(PE)文件的原始二进制数据集进行分类。QSVM算法通过不同的特征映射结合量子核实现,并在Qiskit SDK和IBM量子计算机上的本地模拟器上进行实施和评估。来自模拟器和量子硬件的实验结果提供了关于量子计算机行为和性能的见解,特别是在处理用于恶意软件检测任务的大规模计算方面。该工作总结了通过Qiskit接口使用量子硬件的实际经验。我们详细描述了遇到的关键问题,以及必须开发和应用于Qiskit机器学习库基础代码的修复措施。这些问题包括向IBM量子系统提交的电路的缺失转码以及由于将所有电路提交给一个作业而导致超过最大作业大小限制。
更新时间: 2025-10-08 09:31:31
领域: quant-ph,cs.LG
Estimating the Joint Probability of Scenario Parameters with Gaussian Mixture Copula Models
This paper presents the first application of Gaussian Mixture Copula Models to the statistical modeling of driving scenarios for the safety validation of automated driving systems. Knowledge of the joint probability distribution of scenario parameters is essential for scenario-based safety assessment, where risk quantification depends on the likelihood of concrete parameter combinations. Gaussian Mixture Copula Models bring together the multimodal expressivity of Gaussian Mixture Models and the flexibility of copulas, enabling separate modeling of marginal distributions and dependencies. We benchmark Gaussian Mixture Copula Models against previously proposed approaches - Gaussian Mixture Models and Gaussian Copula Models - using real-world driving data drawn from scenarios defined in United Nations Regulation No. 157. Our evaluation across approximately 18 million scenario instances demonstrates that Gaussian Mixture Copula Models consistently surpass Gaussian Copula Models and perform better than, or at least comparably to, Gaussian Mixture Models, as measured by both log-likelihood and Sinkhorn distance. These results are promising for the adoption of Gaussian Mixture Copula Models as a statistical foundation for future scenario-based validation frameworks.
Updated: 2025-10-08 09:26:20
标题: 估计使用高斯混合Copula模型的情景参数的联合概率
摘要: 本文首次将高斯混合Copula模型应用于自动驾驶系统的安全验证中的驾驶情景的统计建模。了解情景参数的联合概率分布对于基于情景的安全评估至关重要,风险量化取决于具体参数组合的可能性。高斯混合Copula模型将高斯混合模型的多模表达能力和copula的灵活性结合在一起,可以分别建模边际分布和依赖关系。我们使用从联合国第157号规定中定义的情景中提取的真实驾驶数据,将高斯混合Copula模型与先前提出的方法 - 高斯混合模型和高斯Copula模型进行基准测试。我们评估了大约1800万个情景实例,结果表明高斯混合Copula模型始终优于高斯Copula模型,并且在对数似然和Sinkhorn距离的衡量下表现优于或至少与高斯混合模型相当。这些结果为采用高斯混合Copula模型作为未来基于情景的验证框架的统计基础是有希望的。
更新时间: 2025-10-08 09:26:20
领域: cs.RO,cs.LG
Get RICH or Die Scaling: Profitably Trading Inference Compute for Robustness
Models are susceptible to adversarially out-of-distribution (OOD) data despite large training-compute investments into their robustification. Zaremba et al. (2025) make progress on this problem at test time, showing LLM reasoning improves satisfaction of model specifications designed to thwart attacks, resulting in a correlation between reasoning effort and robustness to jailbreaks. However, this benefit of test compute fades when attackers are given access to gradients or multimodal inputs. We address this gap, clarifying that inference-compute offers benefits even in such cases. Our approach argues that compositional generalization, through which OOD data is understandable via its in-distribution (ID) components, enables adherence to defensive specifications on adversarially OOD inputs. Namely, we posit the Robustness from Inference Compute Hypothesis (RICH): inference-compute defenses profit as the model's training data better reflects the attacked data's components. We empirically support this hypothesis across vision language model and attack types, finding robustness gains from test-time compute if specification following on OOD data is unlocked by compositional generalization, while RL finetuning and protracted reasoning are not critical. For example, increasing emphasis on defensive specifications via prompting lowers the success rate of gradient-based multimodal attacks on VLMs robustified by adversarial pretraining, but this same intervention provides no such benefit to not-robustified models. This correlation of inference-compute's robustness benefit with base model robustness is the rich-get-richer dynamic of the RICH: attacked data components are more ID for robustified models, aiding compositional generalization to OOD data. Accordingly, we advise layering train-time and test-time defenses to obtain their synergistic benefit.
Updated: 2025-10-08 09:18:53
标题: 致富或尽缩减:盈利交易推断计算以提高鲁棒性
摘要: 尽管在提高鲁棒性方面进行了大量的训练和计算投资,但模型仍然容易受到对抗性的分布外(OOD)数据的影响。Zaremba等人(2025年)在测试时取得了进展,展示了LLM推理如何改善满足旨在阻挡攻击的模型规范,从而导致推理工作量与防止越狱的鲁棒性之间存在相关性。然而,当攻击者可以访问梯度或多模态输入时,测试计算的好处会逐渐消失。我们解决了这一差距,澄清了即使在这些情况下,推理计算也能带来好处。我们的方法认为,通过组合泛化,使OOD数据可通过其ID组件理解,从而使模型能够遵守对抗性OOD输入的防御规范。换句话说,我们提出了推理计算防御受益于模型的训练数据更好地反映被攻击数据组件的假设(RICH):如果规范遵循OOD数据解锁,则从推理计算获益,而基于RL微调和长时间推理并不关键。例如,通过加强提示对防御规范的重视降低了对通过对抗性预训练进行鲁棒化的VLMs的梯度基攻击的成功率,但对不经过鲁棒化的模型却没有提供类似的好处。推理计算的鲁棒性好处与基础模型鲁棒性之间的相关性是RICH的富者更富动态:被攻击的数据组件对于经过鲁棒化的模型更具ID属性,有助于将OOD数据进行组合泛化。因此,我们建议将训练时和测试时的防御措施结合起来,以获得它们协同作用的好处。
更新时间: 2025-10-08 09:18:53
领域: cs.LG
VICON: Vision In-Context Operator Networks for Multi-Physics Fluid Dynamics Prediction
In-Context Operator Networks (ICONs) have demonstrated the ability to learn operators across diverse partial differential equations using few-shot, in-context learning. However, existing ICONs process each spatial point as an individual token, severely limiting computational efficiency when handling dense data in higher spatial dimensions. We propose Vision In-Context Operator Networks (VICON), which integrates vision transformer architectures to efficiently process 2D data through patch-wise operations while preserving ICON's adaptability to multiphysics systems and varying timesteps. Evaluated across three fluid dynamics benchmarks, VICON significantly outperforms state-of-the-art baselines: DPOT and MPP, reducing the averaged last-step rollout error by 37.9% compared to DPOT and 44.7% compared to MPP, while requiring only 72.5% and 34.8% of their respective inference times. VICON naturally supports flexible rollout strategies with varying timestep strides, enabling immediate deployment in imperfect measurement systems where sampling frequencies may differ or frames might be dropped - common challenges in real-world settings - without requiring retraining or interpolation. In these realistic scenarios, VICON exhibits remarkable robustness, experiencing only 24.41% relative performance degradation compared to 71.37%-74.49% degradation in baseline methods, demonstrating its versatility for deploying in realistic applications. Our scripts for processing datasets and code are publicly available at https://github.com/Eydcao/VICON.
Updated: 2025-10-08 09:18:34
标题: VICON:用于多物理流体动力学预测的视觉上下文操作员网络
摘要: In-Context Operator Networks(ICONs)已经证明能够通过少量样本的、在上下文中的学习,学习各种偏微分方程中的操作符。然而,现有的ICONs将每个空间点处理为一个独立的标记,这在处理高维密集数据时严重限制了计算效率。我们提出了Vision In-Context Operator Networks(VICON),它将视觉变换器架构整合到其中,通过分块操作有效处理2D数据,同时保留ICON对多物理系统和不同时间步长的适应性。在三个流体动力学基准测试中评估,VICON明显优于最先进的基线模型:DPOT和MPP,相比于DPOT降低了37.9%的平均最后步骤回滚误差,相比于MPP降低了44.7%,同时仅需它们分别推理时间的72.5%和34.8%。VICON自然支持具有不同时间步长的灵活回滚策略,使其能够立即部署在采样频率可能不同或可能丢失帧的不完美测量系统中——这是现实世界环境中常见的挑战——而无需重新训练或插值。在这些现实场景中,VICON表现出了卓越的鲁棒性,相对于基线方法的71.37%-74.49%的性能降级,仅经历了24.41%的相对性能降级,展示了其在实际应用中的多功能性。我们的数据集处理脚本和代码公开可用在https://github.com/Eydcao/VICON。
更新时间: 2025-10-08 09:18:34
领域: cs.LG,cs.NA,math.NA,physics.flu-dyn
Bionetta: Efficient Client-Side Zero-Knowledge Machine Learning Proving
In this report, we compare the performance of our UltraGroth-based zero-knowledge machine learning framework Bionetta to other tools of similar purpose such as EZKL, Lagrange's deep-prove, or zkml. The results show a significant boost in the proving time for custom-crafted neural networks: they can be proven even on mobile devices, enabling numerous client-side proving applications. While our scheme increases the cost of one-time preprocessing steps, such as circuit compilation and generating trusted setup, our approach is, to the best of our knowledge, the only one that is deployable on the native EVM smart contracts without overwhelming proof size and verification overheads.
Updated: 2025-10-08 09:10:32
标题: Bionetta:高效的客户端零知识机器学习证明
摘要: 在本报告中,我们将基于UltraGroth的零知识机器学习框架Bionetta与其他类似目的的工具进行了比较,例如EZKL、Lagrange的deep-prove或zkml。结果显示,对于定制的神经网络,证明时间显著提升:它们甚至可以在移动设备上进行证明,从而实现众多客户端证明应用。虽然我们的方案增加了一次性预处理步骤的成本,例如电路编译和生成可信设置,但据我们所知,我们的方法是唯一一个可以部署在原生EVM智能合约上的,而不会带来巨大的证明大小和验证开销。
更新时间: 2025-10-08 09:10:32
领域: cs.CR,cs.CV
Evil twins are not that evil: Qualitative insights into machine-generated prompts
It has been widely observed that language models (LMs) respond in predictable ways to algorithmically generated prompts that are seemingly unintelligible. This is both a sign that we lack a full understanding of how LMs work, and a practical challenge, because opaqueness can be exploited for harmful uses of LMs, such as jailbreaking. We present the first thorough analysis of opaque machine-generated prompts, or autoprompts, pertaining to 6 LMs of different sizes and families. We find that machine-generated prompts are characterized by a last token that is often intelligible and strongly affects the generation. A small but consistent proportion of the previous tokens are prunable, probably appearing in the prompt as a by-product of the fact that the optimization process fixes the number of tokens. The remaining tokens fall into two categories: filler tokens, which can be replaced with semantically unrelated substitutes, and keywords, that tend to have at least a loose semantic relation with the generation, although they do not engage in well-formed syntactic relations with it. Additionally, human experts can reliably identify the most influential tokens in an autoprompt a posteriori, suggesting these prompts are not entirely opaque. Finally, some of the ablations we applied to autoprompts yield similar effects in natural language inputs, suggesting that autoprompts emerge naturally from the way LMs process linguistic inputs in general.
Updated: 2025-10-08 09:07:59
标题: 邪恶的双胞胎并不那么邪恶:对机器生成提示的定性洞察
摘要: 这些文献摘要表明,语言模型(LMs)对算法生成的似乎难以理解的提示作出可预测的反应。这既表明我们对LMs的工作原理还没有完全理解,也是一个实际挑战,因为模糊性可以被利用来对LMs进行有害的使用,比如越狱。我们首次对不透明的机器生成提示,或者自动生成的提示进行了彻底分析,涉及了6个不同规模和家族的LMs。我们发现,机器生成的提示的特点是最后一个标记通常是可以理解的,并且强烈影响生成。先前标记的一小部分是可修剪的,可能出现在提示中是因为优化过程固定了标记数。其余标记分为两类:填充标记,可以用语义无关的替代品替换,和关键词,倾向于与生成物至少具有松散的语义关系,尽管它们与之没有形成良好的句法关系。此外,人类专家可以可靠地事后识别自动生成提示中最有影响力的标记,这表明这些提示并非完全不透明。最后,我们对自动生成提示所应用的一些消融在自然语言输入中产生了类似的效果,这表明自动生成提示自然地从LMs处理语言输入的方式中产生。
更新时间: 2025-10-08 09:07:59
领域: cs.CL,cs.AI,cs.LG
Token-based Audio Inpainting via Discrete Diffusion
Audio inpainting seeks to restore missing segments in degraded recordings. Previous diffusion-based methods exhibit impaired performance when the missing region is large. We introduce the first approach that applies discrete diffusion over tokenized music representations from a pre-trained audio tokenizer, enabling stable and semantically coherent restoration of long gaps. Our method further incorporates two training approaches: a derivative-based regularization loss that enforces smooth temporal dynamics, and a span-based absorbing transition that provides structured corruption during diffusion. Experiments on the MusicNet and MAESTRO datasets with gaps up to 750 ms show that our approach consistently outperforms strong baselines across range of gap lengths, for gaps of 150 ms and above. This work advances musical audio restoration and introduces new directions for discrete diffusion model training. Audio examples of our proposed method can be found at https://iftach21.github.io/.
Updated: 2025-10-08 09:01:13
标题: 基于令牌的音频修复:通过离散扩散进行
摘要: 音频修复旨在恢复受损录音中的缺失部分。先前基于扩散的方法在缺失区域较大时表现不佳。我们介绍了一种首次应用离散扩散的方法,该方法基于预训练音频分词器对音乐表示进行标记,从而实现对长间隔的稳定和语义连贯的恢复。我们的方法进一步融合了两种训练方法:一种基于导数的正则化损失,强化平滑的时间动态,以及一种基于跨度的吸收过渡,在扩散过程中提供结构化的破坏。在MusicNet和MAESTRO数据集上进行的实验表明,我们的方法在750毫秒的间隔范围内始终优于强基线方法,对150毫秒及以上的间隔效果尤为显著。这项工作推动了音乐音频修复的发展,并为离散扩散模型训练开辟了新的方向。我们提出方法的音频示例可在https://iftach21.github.io/找到。
更新时间: 2025-10-08 09:01:13
领域: cs.SD,cs.AI,cs.IT,cs.LG,eess.AS,math.IT
Modeling COVID-19 Dynamics in German States Using Physics-Informed Neural Networks
The COVID-19 pandemic has highlighted the need for quantitative modeling and analysis to understand real-world disease dynamics. In particular, post hoc analyses using compartmental models offer valuable insights into the effectiveness of public health interventions, such as vaccination strategies and containment policies. However, such compartmental models like SIR (Susceptible-Infectious-Recovered) often face limitations in directly incorporating noisy observational data. In this work, we employ Physics-Informed Neural Networks (PINNs) to solve the inverse problem of the SIR model using infection data from the Robert Koch Institute (RKI). Our main contribution is a fine-grained, spatio-temporal analysis of COVID-19 dynamics across all German federal states over a three-year period. We estimate state-specific transmission and recovery parameters and time-varying reproduction number (R_t) to track the pandemic progression. The results highlight strong variations in transmission behavior across regions, revealing correlations with vaccination uptake and temporal patterns associated with major pandemic phases. Our findings demonstrate the utility of PINNs in localized, long-term epidemiological modeling.
Updated: 2025-10-08 08:59:39
标题: 用物理信息神经网络对德国各州COVID-19动态进行建模
摘要: 新冠肺炎大流行凸显了量化建模和分析的必要性,以理解现实世界疾病动态。特别是,使用隔离模型进行事后分析可以为公共卫生干预措施(如疫苗接种策略和遏制政策)的有效性提供宝贵见解。然而,诸如SIR(易感-感染-康复)之类的隔离模型通常面临直接整合嘈杂观察数据的局限性。在本研究中,我们利用物理信息神经网络(PINNs)来解决SIR模型的逆问题,使用来自罗伯特科赫研究所(RKI)的感染数据。我们的主要贡献是对德国所有联邦州在三年期间的COVID-19动态进行细粒度的时空分析。我们估计了各州的传播和恢复参数以及时变的繁殖数(R_t)以跟踪大流行的进展。结果凸显了不同地区传播行为的强烈变化,揭示了与疫苗接种率相关的相关性以及与主要大流行阶段相关的时间模式。我们的发现展示了PINNs在局部化、长期流行病学建模中的实用性。
更新时间: 2025-10-08 08:59:39
领域: cs.LG,cs.AI
Function regression using the forward forward training and inferring paradigm
Function regression/approximation is a fundamental application of machine learning. Neural networks (NNs) can be easily trained for function regression using a sufficient number of neurons and epochs. The forward-forward learning algorithm is a novel approach for training neural networks without backpropagation, and is well suited for implementation in neuromorphic computing and physical analogs for neural networks. To the best of the authors' knowledge, the Forward Forward paradigm of training and inferencing NNs is currently only restricted to classification tasks. This paper introduces a new methodology for approximating functions (function regression) using the Forward-Forward algorithm. Furthermore, the paper evaluates the developed methodology on univariate and multivariate functions, and provides preliminary studies of extending the proposed Forward-Forward regression to Kolmogorov Arnold Networks, and Deep Physical Neural Networks.
Updated: 2025-10-08 08:41:14
标题: 使用前向训练和推断范式进行函数回归
摘要: 功能回归/逼近是机器学习的一个基本应用。神经网络(NNs)可以通过足够数量的神经元和时代轻松训练用于函数回归。前向学习算法是一种新颖的方法,用于训练神经网络而不需要反向传播,并且非常适合在神经形态计算和神经网络的物理类比中实现。据作者所知,目前只有分类任务才使用前向学习范式来训练和推理NNs。本文介绍了一种使用前向学习算法逼近函数(函数回归)的新方法。此外,本文评估了开发的方法论在单变量和多变量函数上的表现,并提供了将所提出的前向回归扩展到科尔莫哥洛夫·阿诺德网络和深度物理神经网络的初步研究。
更新时间: 2025-10-08 08:41:14
领域: cs.LG
Is Supervised Learning Really That Different from Unsupervised?
We demonstrate how supervised learning can be decomposed into a two-stage procedure, where (1) all model parameters are selected in an unsupervised manner, and (2) the outputs y are added to the model, without changing the parameter values. This is achieved by a new model selection criterion that, in contrast to cross-validation, can be used also without access to y. For linear ridge regression, we bound the asymptotic out-of-sample risk of our method in terms of the optimal asymptotic risk. We also demonstrate on real and synthetic data that versions of linear and kernel ridge regression, smoothing splines, and neural networks, which are trained without access to y, perform similarly to their standard y-based counterparts. Hence, our results suggest that the difference between supervised and unsupervised learning is less fundamental than it may appear.
Updated: 2025-10-08 08:28:20
标题: 监督学习真的与无监督学习有很大不同吗?
摘要: 我们展示了如何将监督学习分解为一个两阶段过程,其中(1)所有模型参数以非监督方式选择,并且(2)输出y被添加到模型中,而不改变参数值。这是通过一个新的模型选择标准实现的,与交叉验证相比,它也可以在没有访问y的情况下使用。对于线性岭回归,我们以最优渐近风险的形式限制了我们方法的渐近样本外风险。我们还在真实和合成数据上展示了线性和核岭回归、平滑样条和神经网络的版本,这些版本在没有访问y的情况下训练时表现与它们标准基于y的对应物类似。因此,我们的结果表明,监督学习和非监督学习之间的差异可能不像看起来的那样根本。
更新时间: 2025-10-08 08:28:20
领域: stat.ML,cs.LG
Achieving Hyperbolic-Like Expressiveness with Arbitrary Euclidean Regions: A New Approach to Hierarchical Embeddings
Hierarchical data is common in many domains like life sciences and e-commerce, and its embeddings often play a critical role. While hyperbolic embeddings offer a theoretically grounded approach to representing hierarchies in low-dimensional spaces, current methods often rely on specific geometric constructs as embedding candidates. This reliance limits their generalizability and makes it difficult to integrate with techniques that model semantic relationships beyond pure hierarchies, such as ontology embeddings. In this paper, we present RegD, a flexible Euclidean framework that supports the use of arbitrary geometric regions -- such as boxes and balls -- as embedding representations. Although RegD operates entirely in Euclidean space, we formally prove that it achieves hyperbolic-like expressiveness by incorporating a depth-based dissimilarity between regions, enabling it to emulate key properties of hyperbolic geometry, including exponential growth. Our empirical evaluation on diverse real-world datasets shows consistent performance gains over state-of-the-art methods and demonstrates RegD's potential for broader applications such as the ontology embedding task that goes beyond hierarchy.
Updated: 2025-10-08 08:26:07
标题: 利用任意欧几里得区域实现类似双曲线表达的方法:一种新的分层嵌入方法
摘要: 分层数据在许多领域中很常见,如生命科学和电子商务,其嵌入通常起着关键作用。虽然双曲嵌入提供了一个在低维空间中表示层次结构的理论基础,但当前方法通常依赖于特定的几何构造作为嵌入候选。这种依赖限制了它们的泛化能力,并使其难以与模拟超出纯层次结构的语义关系的技术集成,如本体嵌入。在本文中,我们提出了RegD,这是一个灵活的欧几里得框架,支持将任意几何区域(如盒子和球)用作嵌入表示。尽管RegD完全在欧几里得空间中运行,我们正式证明它通过在区域之间加入基于深度的差异性,实现了类似双曲的表现力,使其能够模拟双曲几何的关键属性,包括指数增长。我们在多样的真实世界数据集上进行的实证评估表明,与最先进的方法相比,RegD显示出一致的性能提升,并展示了RegD在更广泛应用方面的潜力,例如超越层次的本体嵌入任务。
更新时间: 2025-10-08 08:26:07
领域: cs.LG,cs.AI
GPS-MTM: Capturing Pattern of Normalcy in GPS-Trajectories with self-supervised learning
Foundation models have driven remarkable progress in text, vision, and video understanding, and are now poised to unlock similar breakthroughs in trajectory modeling. We introduce the GPSMasked Trajectory Transformer (GPS-MTM), a foundation model for large-scale mobility data that captures patterns of normalcy in human movement. Unlike prior approaches that flatten trajectories into coordinate streams, GPS-MTM decomposes mobility into two complementary modalities: states (point-of-interest categories) and actions (agent transitions). Leveraging a bi-directional Transformer with a self-supervised masked modeling objective, the model reconstructs missing segments across modalities, enabling it to learn rich semantic correlations without manual labels. Across benchmark datasets, including Numosim-LA, Urban Anomalies, and Geolife, GPS-MTM consistently outperforms on downstream tasks such as trajectory infilling and next-stop prediction. Its advantages are most pronounced in dynamic tasks (inverse and forward dynamics), where contextual reasoning is critical. These results establish GPS-MTM as a robust foundation model for trajectory analytics, positioning mobility data as a first-class modality for large-scale representation learning. Code is released for further reference.
Updated: 2025-10-08 08:21:22
标题: GPS-MTM:利用自监督学习捕捉GPS轨迹中的正常模式
摘要: 基础模型在文本、视觉和视频理解领域取得了显著进展,现在正准备在轨迹建模领域实现类似的突破。我们引入了GPSMasked Trajectory Transformer(GPS-MTM),这是一个针对大规模移动数据的基础模型,捕捉了人类移动中的正常模式。与先前的方法不同,该模型将移动性分解为两种互补的方式:状态(兴趣点类别)和动作(代理转换)。利用具有自监督掩蔽建模目标的双向Transformer,该模型跨模态重构缺失的片段,使其能够学习丰富的语义相关性而无需手动标签。在基准数据集(包括Numosim-LA、Urban Anomalies和Geolife等)上,GPS-MTM在轨迹填充和下一站预测等下游任务上一直表现优异。它的优势在动态任务(逆向和正向动力学)中表现最为显著,其中上下文推理至关重要。这些结果确立了GPS-MTM作为轨迹分析的稳健基础模型,将移动数据定位为大规模表示学习的一流模式。代码已发布供进一步参考。
更新时间: 2025-10-08 08:21:22
领域: cs.LG,cs.AI,cs.CV,cs.MA
EvalMORAAL: Interpretable Chain-of-Thought and LLM-as-Judge Evaluation for Moral Alignment in Large Language Models
We present EvalMORAAL, a transparent chain-of-thought (CoT) framework that uses two scoring methods (log-probabilities and direct ratings) plus a model-as-judge peer review to evaluate moral alignment in 20 large language models. We assess models on the World Values Survey (55 countries, 19 topics) and the PEW Global Attitudes Survey (39 countries, 8 topics). With EvalMORAAL, top models align closely with survey responses (Pearson's r approximately 0.90 on WVS). Yet we find a clear regional difference: Western regions average r=0.82 while non-Western regions average r=0.61 (a 0.21 absolute gap), indicating consistent regional bias. Our framework adds three parts: (1) two scoring methods for all models to enable fair comparison, (2) a structured chain-of-thought protocol with self-consistency checks, and (3) a model-as-judge peer review that flags 348 conflicts using a data-driven threshold. Peer agreement relates to survey alignment (WVS r=0.74, PEW r=0.39, both p<.001), supporting automated quality checks. These results show real progress toward culture-aware AI while highlighting open challenges for use across regions.
Updated: 2025-10-08 08:03:38
标题: EvalMORAAL:可解释的思维链和以LLM为评判者的道德对齐评估在大型语言模型中
摘要: 我们提出了EvalMORAAL,这是一个透明的思维链(CoT)框架,使用两种评分方法(对数概率和直接评分)以及模型作为评审来评估20个大型语言模型的道德一致性。我们在世界价值观调查(55个国家,19个主题)和PEW全球态度调查(39个国家,8个主题)上评估模型。通过EvalMORAAL,顶尖模型与调查结果密切一致(在WVS上的Pearson's r约为0.90)。然而,我们发现明显的区域差异:西方地区平均r=0.82,而非西方地区平均r=0.61(绝对差距为0.21),表明存在一致的区域偏见。我们的框架增加了三个部分:(1)为所有模型提供两种评分方法以实现公平比较,(2)具有自洽性检查的结构化思维链协议,以及(3)使用数据驱动的阈值标记348个冲突的模型作为评审。同行间的一致性与调查结果的一致性相关(WVS r=0.74,PEW r=0.39,均为p<.001),支持自动化质量检查。这些结果显示出了朝着文化意识型AI的真正进展,同时突出了在各个地区使用中存在的挑战。
更新时间: 2025-10-08 08:03:38
领域: cs.CL,cs.AI
MultiCNKG: Integrating Cognitive Neuroscience, Gene, and Disease Knowledge Graphs Using Large Language Models
The advent of large language models (LLMs) has revolutionized the integration of knowledge graphs (KGs) in biomedical and cognitive sciences, overcoming limitations in traditional machine learning methods for capturing intricate semantic links among genes, diseases, and cognitive processes. We introduce MultiCNKG, an innovative framework that merges three key knowledge sources: the Cognitive Neuroscience Knowledge Graph (CNKG) with 2.9K nodes and 4.3K edges across 9 node types and 20 edge types; Gene Ontology (GO) featuring 43K nodes and 75K edges in 3 node types and 4 edge types; and Disease Ontology (DO) comprising 11.2K nodes and 8.8K edges with 1 node type and 2 edge types. Leveraging LLMs like GPT-4, we conduct entity alignment, semantic similarity computation, and graph augmentation to create a cohesive KG that interconnects genetic mechanisms, neurological disorders, and cognitive functions. The resulting MultiCNKG encompasses 6.9K nodes across 5 types (e.g., Genes, Diseases, Cognitive Processes) and 11.3K edges spanning 7 types (e.g., Causes, Associated with, Regulates), facilitating a multi-layered view from molecular to behavioral domains. Assessments using metrics such as precision (85.20%), recall (87.30%), coverage (92.18%), graph consistency (82.50%), novelty detection (40.28%), and expert validation (89.50%) affirm its robustness and coherence. Link prediction evaluations with models like TransE (MR: 391, MRR: 0.411) and RotatE (MR: 263, MRR: 0.395) show competitive performance against benchmarks like FB15k-237 and WN18RR. This KG advances applications in personalized medicine, cognitive disorder diagnostics, and hypothesis formulation in cognitive neuroscience.
Updated: 2025-10-08 07:59:32
标题: MultiCNKG:使用大型语言模型集成认知神经科学、基因和疾病知识图
摘要: 大语言模型(LLMs)的出现在生物医学和认知科学领域中的知识图谱(KGs)整合中引起了革命,克服了传统机器学习方法在捕捉基因、疾病和认知过程之间复杂语义链接方面的局限性。我们介绍了MultiCNKG,这是一个创新框架,将三个关键知识源合并在一起:认知神经科学知识图(CNKG)具有2.9K个节点和4.3K个边跨越9种节点类型和20种边类型;基因本体(GO)包括43K个节点和75K个边在3种节点类型和4种边类型;疾病本体(DO)包括11.2K个节点和8.8K个边,具有1种节点类型和2种边类型。利用诸如GPT-4之类的LLMs,我们进行实体对齐、语义相似度计算和图增强,创建一个连贯的KG,连接遗传机制、神经系统疾病和认知功能。由此产生的MultiCNKG包括5种类型(例如基因、疾病、认知过程)的6.9K个节点和7种类型(例如原因、相关性、调节)的11.3K个边,促进了从分子到行为领域的多层次视图。使用精度(85.20%)、召回率(87.30%)、覆盖率(92.18%)、图一致性(82.50%)、新颖性检测(40.28%)和专家验证(89.50%)等指标进行评估,证实了其稳健性和连贯性。与TransE(MR:391,MRR:0.411)和RotatE(MR:263,MRR:0.395)等模型进行链接预测评估显示出与FB15k-237和WN18RR等基准的竞争性表现。这个KG推动了个性化医学、认知障碍诊断和认知神经科学假设制定等应用的发展。
更新时间: 2025-10-08 07:59:32
领域: cs.AI,cs.LG
2 OLMo 2 Furious
We present OLMo 2, the next generation of our fully open language models. OLMo 2 includes a family of dense autoregressive language models at 7B, 13B and 32B scales with fully released artifacts -- model weights, full training data, training code and recipes, training logs and thousands of intermediate checkpoints. In this work, we describe our modified model architecture and training recipe, focusing on techniques for achieving better training stability and improved per-token efficiency. Our updated pretraining data mixture introduces a new, specialized data mix called Dolmino Mix 1124, which significantly improves model capabilities across many downstream task benchmarks when introduced via late-stage curriculum training (i.e. specialized data during the annealing phase of pretraining). Finally, we incorporate best practices from T\"ulu 3 to develop OLMo 2-Instruct, focusing on permissive data and extending our final-stage reinforcement learning with verifiable rewards (RLVR). Our OLMo 2 base models sit at the Pareto frontier of performance to training compute, often matching or outperforming open-weight only models like Llama 3.1, Qwen 2.5, and Gemma 2 while using fewer FLOPs and with fully transparent training data, code, and recipe. Our fully open OLMo 2-Instruct models are competitive with open-weight only models of comparable size and even some proprietary models like GPT-3.5 Turbo and GPT 4o Mini.
Updated: 2025-10-08 07:50:45
标题: 2 OLMo 2 Furious-->2 OLMo 2 狂怒
摘要: 我们呈现OLMo 2,我们完全开放的语言模型的下一代。 OLMo 2包括一个家族的密集自回归语言模型,规模为7B、13B和32B,具有完全释放的工件——模型权重、完整训练数据、训练代码和配方、训练日志和成千上万的中间检查点。在这项工作中,我们描述了我们修改后的模型架构和训练配方,重点放在实现更好的训练稳定性和改进每个令牌效率的技术上。我们更新的预训练数据混合引入了一个新的、专门的数据混合,称为Dolmino Mix 1124,当通过后期课程训练(即预训练的退火阶段专门数据)引入时,显著提高了模型在许多下游任务基准上的能力。最后,我们结合T\"ulu 3的最佳实践开发OLMo 2-Instruct,侧重于允许性数据,并通过可验证奖励(RLVR)扩展我们的最终阶段强化学习。我们的OLMo 2基础模型坐在性能到训练计算的帕累托边界上,通常能够匹配或胜过仅使用开放权重的模型,如Llama 3.1、Qwen 2.5和Gemma 2,同时使用更少的浮点操作数,并具有完全透明的训练数据、代码和配方。我们完全开放的OLMo 2-Instruct模型与可比大小的仅开放权重模型和一些专有模型竞争,如GPT-3.5 Turbo和GPT 4o Mini。
更新时间: 2025-10-08 07:50:45
领域: cs.CL,cs.LG
Incorporating Expert Knowledge into Bayesian Causal Discovery of Mixtures of Directed Acyclic Graphs
Bayesian causal discovery benefits from prior information elicited from domain experts, and in heterogeneous domains any prior knowledge would be badly needed. However, so far prior elicitation approaches have assumed a single causal graph and hence are not suited to heterogeneous domains. We propose a causal elicitation strategy for heterogeneous settings, based on Bayesian experimental design (BED) principles, and a variational mixture structure learning (VaMSL) method -- extending the earlier differentiable Bayesian structure learning (DiBS) method -- to iteratively infer mixtures of causal Bayesian networks (CBNs). We construct an informative graph prior incorporating elicited expert feedback in the inference of mixtures of CBNs. Our proposed method successfully produces a set of alternative causal models (mixture components or clusters), and achieves an improved structure learning performance on heterogeneous synthetic data when informed by a simulated expert. Finally, we demonstrate that our approach is capable of capturing complex distributions in a breast cancer database.
Updated: 2025-10-08 07:47:18
标题: 将专家知识融入到混合有向无环图的贝叶斯因果发现中
摘要: 贝叶斯因果发现受益于领域专家提供的先验信息,在异质领域中任何先验知识都是非常必要的。然而,到目前为止,先验引导方法假定存在单一因果图,因此不适用于异质领域。我们提出了一种针对异质设置的因果引导策略,基于贝叶斯实验设计(BED)原则,并使用变分混合结构学习(VaMSL)方法——扩展了早期的可微贝叶斯结构学习(DiBS)方法——以迭代地推断因果贝叶斯网络(CBN)的混合。我们构建了一个包含引导专家反馈的信息图先验,用于推断CBN的混合。我们提出的方法成功地生成了一组替代因果模型(混合成分或簇),并在由模拟专家提供信息时,在异质合成数据上实现了改进的结构学习性能。最后,我们证明了我们的方法能够在乳腺癌数据库中捕捉复杂的分布。
更新时间: 2025-10-08 07:47:18
领域: cs.LG,stat.ME
Distributional Machine Unlearning via Selective Data Removal
Machine learning systems increasingly face requirements to remove entire domains of information -- such as toxic language or biases -- rather than individual user data. This task presents a dilemma: full removal of the unwanted domain data is computationally expensive, while random partial removal is statistically inefficient. We find that a domain's statistical influence is often concentrated in a small subset of its data samples, suggesting a path between ineffective partial removal and unnecessary complete removal. We formalize this as distributional unlearning: a framework to select a small subset that balances forgetting an unwanted distribution while preserving a desired one. Using Kullback-Leibler divergence constraints, we derive the exact removal-preservation Pareto frontier for exponential families and prove that models trained on the edited data achieve corresponding log-loss bounds. We propose a distance-based selection algorithm and show it is quadratically more sample-efficient than random removal in the challenging low-divergence regime. Experiments across synthetic, text, and image datasets (Jigsaw, CIFAR-10, SMS spam) show our method requires 15-82% less deletion than full removal for strong unlearning effects, e.g., halving initial forget set accuracy. Ultimately, by showing a small forget set often suffices, our framework lays the foundations for more scalable and rigorous subpopulation unlearning.
Updated: 2025-10-08 07:38:34
标题: 通过选择性数据删除实现分布式机器遗忘
摘要: 机器学习系统越来越需要删除整个领域的信息,比如有毒语言或偏见,而不是个别用户数据。这项任务提出了一个困境:完全删除不需要的领域数据在计算上很昂贵,而随机部分删除在统计上效率低下。我们发现一个领域的统计影响通常集中在其数据样本的一个小子集中,这表明存在一条路径可以在部分删除无效和完全不必要删除之间找到平衡。我们将其形式化为分布式遗忘:一个框架来选择一个小子集来平衡忘记一个不需要的分布和保留一个期望的分布。使用Kullback-Leibler散度约束,我们推导出指数族的确切删除-保留帕累托前沿,并证明在编辑数据上训练的模型达到相应的对数损失边界。我们提出了一种基于距离的选择算法,并证明在具有挑战性的低散度范围内,它比随机删除更加高效。在合成、文本和图像数据集(拼图、CIFAR-10、短信垃圾)的实验中,我们的方法比完全删除需要15-82%的删除,以实现强大的遗忘效果,例如将初始遗忘集准确性减半。最终,通过展示一个小的遗忘集通常足够,我们的框架为更可扩展和严格的子群体遗忘奠定了基础。
更新时间: 2025-10-08 07:38:34
领域: cs.LG,cs.CR,stat.ML
An Empirical Analysis of the Laplace and Neural Tangent Kernels
The neural tangent kernel is a kernel function defined over the parameter distribution of an infinite width neural network. Despite the impracticality of this limit, the neural tangent kernel has allowed for a more direct study of neural networks and a gaze through the veil of their black box. More recently, it has been shown theoretically that the Laplace kernel and neural tangent kernel share the same reproducing kernel Hilbert space in the space of $\mathbb{S}^{d-1}$ alluding to their equivalence. In this work, we analyze the practical equivalence of the two kernels. We first do so by matching the kernels exactly and then by matching posteriors of a Gaussian process. Moreover, we analyze the kernels in $\mathbb{R}^d$ and experiment with them in the task of regression.
Updated: 2025-10-08 07:37:52
标题: 一个对拉普拉斯核和神经切向核的经验分析
摘要: 神经切向核是一个定义在无限宽度神经网络参数分布上的核函数。尽管这种极限在实际中并不可行,神经切向核使得对神经网络的研究更加直接,可以透过它们的黑匣子。最近,理论上已经证明拉普拉斯核和神经切向核在$\mathbb{S}^{d-1}$空间中共享相同的再生核希尔伯特空间,暗示它们的等价性。在这项工作中,我们分析了这两种核的实际等价性。我们首先通过精确匹配核函数来进行分析,然后通过匹配高斯过程的后验来进行分析。此外,我们在$\mathbb{R}^d$空间中分析这些核,并在回归任务中进行实验。
更新时间: 2025-10-08 07:37:52
领域: stat.ML,cs.LG,math.FA,math.ST,stat.TH,62M08 (Primary), 46C08 (Secondary),G.3
Epistemic Diversity and Knowledge Collapse in Large Language Models
Large language models (LLMs) tend to generate lexically, semantically, and stylistically homogenous texts. This poses a risk of knowledge collapse, where homogenous LLMs mediate a shrinking in the range of accessible information over time. Existing works on homogenization are limited by a focus on closed-ended multiple-choice setups or fuzzy semantic features, and do not look at trends across time and cultural contexts. To overcome this, we present a new methodology to measure epistemic diversity, i.e., variation in real-world claims in LLM outputs, which we use to perform a broad empirical study of LLM knowledge collapse. We test 27 LLMs, 155 topics covering 12 countries, and 200 prompt variations sourced from real user chats. For the topics in our study, we show that while newer models tend to generate more diverse claims, nearly all models are less epistemically diverse than a basic web search. We find that model size has a negative impact on epistemic diversity, while retrieval-augmented generation (RAG) has a positive impact, though the improvement from RAG varies by the cultural context. Finally, compared to a traditional knowledge source (Wikipedia), we find that country-specific claims reflect the English language more than the local one, highlighting a gap in epistemic representation
Updated: 2025-10-08 07:35:57
标题: 大型语言模型中的认知多样性和知识坍塌
摘要: 大型语言模型(LLMs)往往生成在词汇、语义和风格上同质化的文本。这带来了知识崩溃的风险,即同质化的LLMs会随着时间的推移导致可访问信息范围的缩小。现有研究在同质化方面存在局限性,集中于封闭式多项选择设置或模糊的语义特征,并未考虑时间和文化背景的趋势。为了克服这一问题,我们提出了一种新方法来衡量认识多样性,即LLM输出中现实世界主张的变化,我们利用这种方法进行了广泛的实证研究LLM知识崩溃。我们测试了27个LLMs,涵盖12个国家的155个主题,以及来源于真实用户聊天的200种提示变体。在我们研究的主题中,我们发现尽管新型模型往往生成更多样化的主张,但几乎所有模型的认知多样性都低于基本的网页搜索。我们发现模型规模对认识多样性有负面影响,而检索增强生成(RAG)有积极影响,尽管RAG的改进因文化背景而异。最后,与传统知识源(维基百科)相比,我们发现各国特定主张更多反映英语而非当地语言,突显了认识表示中的差距。
更新时间: 2025-10-08 07:35:57
领域: cs.CL,cs.AI,cs.CY,cs.IR,cs.LG
Scaling LLM Multi-turn RL with End-to-end Summarization-based Context Management
We study reinforcement learning (RL) fine-tuning of large language model (LLM) agents for long-horizon multi-turn tool use, where context length quickly becomes a fundamental bottleneck. Existing RL pipelines can suffer from degraded instruction following, excessive rollout costs, and most importantly, strict context limits. To address these challenges, we introduce summarization-based context management to training. In specific, it periodically compresses the tool using history by LLM-generated summaries that retain task-relevant information to keep a compact context while enabling the agent to scale beyond the fixed context window. Building on this formulation, we derive a policy gradient representation that seamlessly enables standard LLM RL infrastructures to optimize both tool-use behaviors as well as summarization strategies in an end-to-end fashion. We instantiate this framework with \underline{SU}mmarization augmented \underline{P}olicy \underline{O}ptimization (\texttt{SUPO}), an LLM RL algorithm that enables long-horizon training beyond a fixed context limit. Experiments on interactive function calling and searching tasks demonstrate that \texttt{SUPO} significantly improves the success rate while maintaining the same or even lower working context length compared to baselines. We also demonstrate that for complex searching tasks, \texttt{SUPO} can further improve the evaluation performance when scaling test-time maximum round of summarization beyond that of training time. Our results establish summarization-based context management as a principled and scalable approach for training RL agents beyond a fixed context length limit.
Updated: 2025-10-08 07:29:22
标题: 使用端到端摘要为基础的上下文管理,扩展LLM多轮RL
摘要: 我们研究了大型语言模型代理的强化学习(RL)微调,用于长时间跨度多轮工具使用,其中上下文长度迅速成为一个基本瓶颈。现有的RL流程可能会遭受降级的指令跟随、过高的展开成本,以及最重要的是,严格的上下文限制。为了解决这些挑战,我们引入了基于摘要的上下文管理到训练中。具体来说,它通过LLM生成的摘要周期性地压缩工具使用历史,保留任务相关信息以保持紧凑的上下文,同时使代理能够超越固定的上下文窗口。基于这一公式,我们推导出一个策略梯度表示,无缝地使标准LLM RL基础设施能够以端到端的方式优化工具使用行为以及摘要策略。我们用摘要增强策略优化(SUPO)实例化了这个框架,这是一种LLM RL算法,可以实现超越固定上下文限制的长时间跨度训练。对交互式函数调用和搜索任务的实验表明,SUPO显著提高了成功率,同时与基线相比,保持相同或甚至更低的工作上下文长度。我们还证明了对于复杂的搜索任务,当在测试时最大摘要轮次超过训练时的情况时,SUPO可以进一步改善评估性能。我们的结果将基于摘要的上下文管理确立为一种合理且可扩展的方法,用于训练RL代理超越固定的上下文长度限制。
更新时间: 2025-10-08 07:29:22
领域: cs.CL,cs.AI,cs.LG
Jailbreak Attack Initializations as Extractors of Compliance Directions
Safety-aligned LLMs respond to prompts with either compliance or refusal, each corresponding to distinct directions in the model's activation space. Recent works show that initializing attacks via self-transfer from other prompts significantly enhances their performance. However, the underlying mechanisms of these initializations remain unclear, and attacks utilize arbitrary or hand-picked initializations. This work presents that each gradient-based jailbreak attack and subsequent initialization gradually converge to a single compliance direction that suppresses refusal, thereby enabling an efficient transition from refusal to compliance. Based on this insight, we propose CRI, an initialization framework that aims to project unseen prompts further along compliance directions. We demonstrate our approach on multiple attacks, models, and datasets, achieving an increased attack success rate (ASR) and reduced computational overhead, highlighting the fragility of safety-aligned LLMs. A reference implementation is available at: https://amit1221levi.github.io/CRI-Jailbreak-Init-LLMs-evaluation.
Updated: 2025-10-08 07:28:04
标题: 越狱攻击初始化作为遵从方向的提取器
摘要: 安全对齐的LLM对提示作出遵从或拒绝的响应,每种响应对应模型激活空间中不同的方向。最近的研究表明,通过从其他提示进行自我传递初始化攻击显著提高了它们的性能。然而,这些初始化的基本机制仍不清楚,并且攻击利用任意或手动选择的初始化。本研究表明,基于梯度的越狱攻击和随后的初始化逐渐收敛到抑制拒绝的单一遵从方向,从而实现了从拒绝到遵从的高效过渡。基于这一观点,我们提出了CRI,一个旨在将未见提示沿着遵从方向进一步投影的初始化框架。我们在多个攻击、模型和数据集上展示了我们的方法,实现了攻击成功率(ASR)的提高和计算开销的减少,突显了安全对齐的LLM的脆弱性。参考实现可在以下网址找到:https://amit1221levi.github.io/CRI-Jailbreak-Init-LLMs-evaluation。
更新时间: 2025-10-08 07:28:04
领域: cs.CR,cs.LG
GMLM: Bridging Graph Neural Networks and Language Models for Heterophilic Node Classification
Integrating Pre-trained Language Models (PLMs) with Graph Neural Networks (GNNs) remains a central challenge in text-rich heterophilic graph learning. We propose a novel integration framework that enables effective fusion between powerful pre-trained text encoders and Relational Graph Convolutional Networks (R-GCNs). Our method enhances the alignment of textual and structural representations through a bidirectional fusion mechanism and contrastive node-level optimization. To evaluate the approach, we train two variants using different PLMs: Snowflake-Embed (state-of-the-art) and GTE-base, each paired with an R-GCN backbone. Experiments on five heterophilic benchmarks demonstrate that our integration method achieves state-of-the-art results on four datasets, surpassing existing GNN and large language model-based approaches. Notably, Snowflake-Embed + R-GCN improves accuracy on the Texas dataset by over 8\% and on Wisconsin by nearly 5\%. These results highlight the effectiveness of our fusion strategy for advancing text-rich graph representation learning.
Updated: 2025-10-08 07:26:24
标题: GMLM: 将图神经网络和语言模型连接起来,用于异质节点分类
摘要: 将预训练语言模型(PLMs)与图神经网络(GNNs)集成在文本丰富的异质图学习中仍然是一个核心挑战。我们提出了一个新颖的集成框架,能够有效地融合强大的预训练文本编码器和关系图卷积网络(R-GCNs)。我们的方法通过双向融合机制和对比节点级优化增强了文本和结构表示的对齐。为了评估该方法,我们使用不同的PLMs训练了两个变体:Snowflake-Embed(最先进)和GTE-base,每个都与一个R-GCN骨干配对。对五个异质基准数据集的实验表明,我们的集成方法在四个数据集上取得了最先进的结果,超过了现有的GNN和大型语言模型方法。值得注意的是,Snowflake-Embed + R-GCN在德克萨斯州数据集上的准确率提高了超过8%,在威斯康星州几乎提高了5%。这些结果突显了我们的融合策略对推进文本丰富的图表示学习的有效性。
更新时间: 2025-10-08 07:26:24
领域: cs.CL,cs.AI,cs.LG
A Dual-Agent Adversarial Framework for Robust Generalization in Deep Reinforcement Learning
Recently, empowered with the powerful capabilities of neural networks, reinforcement learning (RL) has successfully tackled numerous challenging tasks. However, while these models demonstrate enhanced decision-making abilities, they are increasingly prone to overfitting. For instance, a trained RL model often fails to generalize to even minor variations of the same task, such as a change in background color or other minor semantic differences. To address this issue, we propose a dual-agent adversarial policy learning framework, which allows agents to spontaneously learn the underlying semantics without introducing any human prior knowledge. Specifically, our framework involves a game process between two agents: each agent seeks to maximize the impact of perturbing on the opponent's policy by producing representation differences for the same state, while maintaining its own stability against such perturbations. This interaction encourages agents to learn generalizable policies, capable of handling irrelevant features from the high-dimensional observations. Extensive experimental results on the Procgen benchmark demonstrate that the adversarial process significantly improves the generalization performance of both agents, while also being applied to various RL algorithms, e.g., Proximal Policy Optimization (PPO). With the adversarial framework, the RL agent outperforms the baseline methods by a significant margin, especially in hard-level tasks, marking a significant step forward in the generalization capabilities of deep reinforcement learning.
Updated: 2025-10-08 07:19:57
标题: 一个双代理对抗框架用于深度强化学习中的稳健泛化
摘要: 最近,借助神经网络强大的能力,强化学习(RL)成功地解决了许多具有挑战性的任务。然而,虽然这些模型展示了增强的决策能力,但它们越来越容易出现过拟合现象。例如,一个经过训练的RL模型经常无法推广到同一任务的即使是微小变化,如背景颜色的改变或其他微小的语义差异。为了解决这个问题,我们提出了一个双代理对抗策略学习框架,允许代理在不引入任何人类先验知识的情况下自发地学习基础语义。具体而言,我们的框架涉及两个代理之间的博弈过程:每个代理都试图通过产生相同状态的表示差异来最大化对对手策略的影响,同时保持自身对这种扰动的稳定性。这种相互作用鼓励代理学习可处理高维观察中的无关特征的通用策略。在Procgen基准测试上的大量实验结果表明,对抗过程显著改善了两个代理的泛化性能,同时也适用于各种RL算法,如Proximal Policy Optimization(PPO)。通过对抗框架,RL代理在各种任务中显著优于基准方法,特别是在困难级别任务中,在深度强化学习的泛化能力方面迈出了重要一步。
更新时间: 2025-10-08 07:19:57
领域: cs.LG,cs.AI
Differentially Private Synthetic Text Generation for Retrieval-Augmented Generation (RAG)
Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by grounding them in external knowledge. However, its application in sensitive domains is limited by privacy risks. Existing private RAG methods typically rely on query-time differential privacy (DP), which requires repeated noise injection and leads to accumulated privacy loss. To address this issue, we propose DP-SynRAG, a framework that uses LLMs to generate differentially private synthetic RAG databases. Unlike prior methods, the synthetic text can be reused once created, thereby avoiding repeated noise injection and additional privacy costs. To preserve essential information for downstream RAG tasks, DP-SynRAG extends private prediction, which instructs LLMs to generate text that mimics subsampled database records in a DP manner. Experiments show that DP-SynRAG achieves superior performanec to the state-of-the-art private RAG systems while maintaining a fixed privacy budget, offering a scalable solution for privacy-preserving RAG.
Updated: 2025-10-08 07:15:50
标题: 差分私有合成文本生成用于检索增强生成(RAG)
摘要: 检索增强生成(RAG)通过将大型语言模型(LLM)与外部知识联系起来,增强了它们的性能。然而,在敏感领域中,由于隐私风险的限制,其应用受到了限制。现有的私密RAG方法通常依赖于查询时差分隐私(DP),这需要重复注入噪声,并导致累积的隐私损失。为解决这一问题,我们提出了DP-SynRAG,这是一个框架,利用LLM生成差分隐私合成RAG数据库。与先前的方法不同,合成文本一旦创建后可以重复使用,从而避免了重复注入噪声和额外的隐私成本。为了保留下游RAG任务所需的基本信息,DP-SynRAG扩展了私有预测,指导LLMs以差分隐私方式生成模拟子采样数据库记录的文本。实验证明,DP-SynRAG在保持固定隐私预算的同时,实现了比最先进的私密RAG系统更优越的性能,为隐私保护的RAG提供了可扩展的解决方案。
更新时间: 2025-10-08 07:15:50
领域: cs.CR,cs.CL,cs.LG
Friend or Foe Inside? Exploring In-Process Isolation to Maintain Memory Safety for Unsafe Rust
Rust is a popular memory-safe systems programming language. In order to interact with hardware or call into non-Rust libraries, Rust provides \emph{unsafe} language features that shift responsibility for ensuring memory safety to the developer. Failing to do so, may lead to memory safety violations in unsafe code which can violate safety of the entire application. In this work we explore in-process isolation with Memory Protection Keys as a mechanism to shield safe program sections from safety violations that may happen in unsafe sections. Our approach is easy to use and comprehensive as it prevents heap and stack-based violations. We further compare process-based and in-process isolation mechanisms and the necessary requirements for data serialization, communication, and context switching. Our results show that in-process isolation can be effective and efficient, permits for a high degree of automation, and also enables a notion of application rewinding where the safe program section may detect and safely handle violations in unsafe code.
Updated: 2025-10-08 07:10:47
标题: 朋友还是敌人?探索在进程中隔离以维护不安全的Rust内存安全
摘要: 锈蚀是一种流行的内存安全系统编程语言。为了与硬件交互或调用非Rust库,Rust提供了\emph{不安全}的语言功能,将确保内存安全的责任转移到开发人员。未能这样做可能导致不安全代码中的内存安全违规,这可能会违反整个应用程序的安全性。在这项工作中,我们探讨了使用内存保护键作为一种机制来保护安全程序部分免受可能在不安全部分发生的安全违规的进程内隔离。我们的方法易于使用且全面,因为它可以防止基于堆和栈的违规行为。我们进一步比较了基于进程和进程内隔离机制以及数据序列化、通信和上下文切换的必要要求。我们的结果表明,进程内隔离可以是有效且高效的,允许高度自动化,并且还支持应用程序倒带的概念,安全程序部分可以检测并安全处理不安全代码中的违规行为。
更新时间: 2025-10-08 07:10:47
领域: cs.CR,cs.PL
Dual Goal Representations
In this work, we introduce dual goal representations for goal-conditioned reinforcement learning (GCRL). A dual goal representation characterizes a state by "the set of temporal distances from all other states"; in other words, it encodes a state through its relations to every other state, measured by temporal distance. This representation provides several appealing theoretical properties. First, it depends only on the intrinsic dynamics of the environment and is invariant to the original state representation. Second, it contains provably sufficient information to recover an optimal goal-reaching policy, while being able to filter out exogenous noise. Based on this concept, we develop a practical goal representation learning method that can be combined with any existing GCRL algorithm. Through diverse experiments on the OGBench task suite, we empirically show that dual goal representations consistently improve offline goal-reaching performance across 20 state- and pixel-based tasks.
Updated: 2025-10-08 07:07:39
标题: 双重目标表征
摘要: 在这项工作中,我们引入了双重目标表示来进行目标条件强化学习(GCRL)。双重目标表示通过“与所有其他状态的时间距离集合”来表征状态;换句话说,它通过状态与每个其他状态的关系来编码状态,通过时间距离来衡量。这种表示提供了几个吸引人的理论性质。首先,它仅取决于环境的内在动态,并且对原始状态表示不变。其次,它包含足够的信息来恢复一个最优的目标达成策略,同时能够滤除外生噪声。基于这个概念,我们开发了一种实用的目标表示学习方法,可以与任何现有的GCRL算法结合使用。通过在OGBench任务套件上进行多样化实验,我们经验性地展示了双重目标表示在20个基于状态和像素的任务中持续改进离线目标达成性能。
更新时间: 2025-10-08 07:07:39
领域: cs.LG,cs.AI
Inefficiencies of Meta Agents for Agent Design
Recent works began to automate the design of agentic systems using meta-agents that propose and iteratively refine new agent architectures. In this paper, we examine three key challenges in a common class of meta-agents. First, we investigate how a meta-agent learns across iterations and find that simply expanding the context with all previous agents, as proposed by previous works, performs worse than ignoring prior designs entirely. We show that the performance improves with an evolutionary approach. Second, although the meta-agent designs multiple agents during training, it typically commits to a single agent at test time. We find that the designed agents have low behavioral diversity, limiting the potential for their complementary use. Third, we assess when automated design is economically viable. We find that only in a few cases--specifically, two datasets--the overall cost of designing and deploying the agents is lower than that of human-designed agents when deployed on over 15,000 examples. In contrast, the performance gains for other datasets do not justify the design cost, regardless of scale.
Updated: 2025-10-08 07:06:17
标题: 元代理的不效率对代理设计的影响
摘要: 最近的研究开始使用元代理自动化设计代理系统,这些代理提出并迭代地完善新的代理架构。在本文中,我们研究了常见类别的元代理中的三个关键挑战。首先,我们调查了元代理如何跨迭代学习,并发现简单地将上下文扩展到所有先前的代理,如以前的作品所建议的那样,效果不如完全忽略先前的设计。我们表明,采用进化方法可以提高性能。其次,虽然元代理在训练过程中设计了多个代理,但在测试时通常只承诺一个代理。我们发现设计的代理行为多样性低,限制了它们互补使用的潜力。第三,我们评估自动设计何时在经济上是可行的。我们发现只有在少数情况下--具体来说,两个数据集--设计和部署代理的总成本低于在超过15,000个示例上部署时人类设计的代理的成本。相比之下,对于其他数据集来说,无论规模如何,性能提升都不能证明设计成本的合理性。
更新时间: 2025-10-08 07:06:17
领域: cs.AI,cs.LG
Representation Gap of the Motzkin Monoid
The linear decomposition attack reveals a vulnerability in encryption algorithms operating within groups or monoids with excessively small representations. The representation gap, defined as the size of the smallest non-trivial representation, therefore serves as a metric to assess the security of these algorithms. This paper will demonstrate that the diagrammatic Motzkin monoids exhibit a large representation gap, positioning them as promising candidates for robust encryption algorithms.
Updated: 2025-10-08 06:59:10
标题: 莫茨金幺半群的表示差距
摘要: 线性分解攻击揭示了在具有过小表示的群或幺半群中运行的加密算法的一个漏洞。因此,表示间隙,定义为最小非平凡表示的大小,因此可作为评估这些算法安全性的度量标准。本文将证明,图示Motzkin幺半群具有较大的表示间隙,使它们成为强大加密算法的有前途的候选者。
更新时间: 2025-10-08 06:59:10
领域: math.RT,cs.CR,Primary: 05E10, 20M30, secondary: 94A60
IPR: Intelligent Prompt Routing with User-Controlled Quality-Cost Trade-offs
Routing incoming queries to the most cost-effective LLM while maintaining response quality poses a fundamental challenge in optimizing performance-cost trade-offs for large-scale commercial systems. We present IPR\, -- \,a quality-constrained \textbf{I}ntelligent \textbf{P}rompt \textbf{R}outing framework that dynamically selects optimal models based on predicted response quality and user-specified tolerance levels. IPR introduces three key innovations: (1) a modular architecture with lightweight quality estimators trained on 1.5M prompts annotated with calibrated quality scores, enabling fine-grained quality prediction across model families; (2) a user-controlled routing mechanism with tolerance parameter $\tau \in [0,1]$ that provides explicit control over quality-cost trade-offs; and (3) an extensible design using frozen encoders with model-specific adapters, reducing new model integration from days to hours. To rigorously train and evaluate IPR, we curate an industrial-level dataset IPRBench\footnote{IPRBench will be released upon legal approval.}, a comprehensive benchmark containing 1.5 million examples with response quality annotations across 11 LLM candidates. Deployed on a major cloud platform, IPR achieves 43.9\% cost reduction while maintaining quality parity with the strongest model in the Claude family and processes requests with sub-150ms latency. The deployed system and additional product details are publicly available at https://aws.amazon.com/bedrock/intelligent-prompt-routing/
Updated: 2025-10-08 06:57:37
标题: IPR:智能提示路由与用户控制的质量成本权衡
摘要: 将传入查询路由到最具成本效益的LLM,同时保持响应质量,对于优化大规模商业系统的性能成本权衡而言是一个基本挑战。我们提出IPR -- 一种质量受限的智能提示路由框架,根据预测的响应质量和用户指定的容忍水平动态选择最佳模型。IPR引入了三个关键创新:(1)模块化架构,使用轻量级质量估计器,在150万个带有校准质量分数注释的提示上进行训练,实现对模型系列的细粒度质量预测;(2)一个用户可控的路由机制,使用容忍参数$\tau \in [0,1]$,提供对质量成本权衡的明确控制;(3)使用冻结编码器和特定于模型的适配器的可扩展设计,将新模型的集成时间从几天缩短到几小时。为了严格训练和评估IPR,我们精选了一个工业级别的数据集IPRBench,这是一个包含150万个示例的全面基准,跨11个LLM候选模型带有响应质量注释。在一个主要的云平台上部署,IPR实现了43.9\%的成本降低,同时保持与Claude家族最强模型的质量相当,并以低于150ms的延迟处理请求。部署的系统和额外的产品细节可在https://aws.amazon.com/bedrock/intelligent-prompt-routing/上公开获取。
更新时间: 2025-10-08 06:57:37
领域: cs.LG
Sustainable LSTM-Based Precoding for RIS-Aided mmWave MIMO Systems with Implicit CSI
In this paper, we propose a sustainable long short-term memory (LSTM)-based precoding framework for reconfigurable intelligent surface (RIS)-assisted millimeter-wave (mmWave) MIMO systems. Instead of explicit channel state information (CSI) estimation, the framework exploits uplink pilot sequences to implicitly learn channel characteristics, reducing both pilot overhead and inference complexity. Practical hardware constraints are addressed by incorporating the phase-dependent amplitude model of RIS elements, while a multi-label training strategy improves robustness when multiple near-optimal codewords yield comparable performance. Simulations show that the proposed design achieves over 90% of the spectral efficiency of exhaustive search (ES) with only 2.2% of its computation time, cutting energy consumption by nearly two orders of magnitude. The method also demonstrates resilience under distribution mismatch and scalability to larger RIS arrays, making it a practical and energy-efficient solution for sustainable 6G wireless networks.
Updated: 2025-10-08 06:53:44
标题: 可持续的基于LSTM的RIS辅助毫米波MIMO系统隐式CSI预编码
摘要: 在本文中,我们提出了一种可持续的基于长短期记忆(LSTM)的预编码框架,用于可重构智能表面(RIS)辅助的毫米波(mmWave)MIMO系统。该框架利用上行导频序列隐式学习信道特性,而不是显式信道状态信息(CSI)估计,从而减少了导频开销和推理复杂度。通过将RIS元件的相依赖幅度模型纳入实际硬件约束条件,同时采用多标签训练策略来提高在多个近似最优码字产生可比性能时的鲁棒性。仿真结果表明,所提出的设计仅用了2.2%的计算时间就实现了超过90%的穷举搜索(ES)的频谱效率,将能耗减少了近两个数量级。该方法还表现出在分布不匹配和扩展到更大的RIS阵列的情况下的弹性,使其成为可持续6G无线网络的实用和高效能解决方案。
更新时间: 2025-10-08 06:53:44
领域: eess.SP,cs.AI,cs.IT,cs.LG,cs.NI,math.IT
ResMimic: From General Motion Tracking to Humanoid Whole-body Loco-Manipulation via Residual Learning
Humanoid whole-body loco-manipulation promises transformative capabilities for daily service and warehouse tasks. While recent advances in general motion tracking (GMT) have enabled humanoids to reproduce diverse human motions, these policies lack the precision and object awareness required for loco-manipulation. To this end, we introduce ResMimic, a two-stage residual learning framework for precise and expressive humanoid control from human motion data. First, a GMT policy, trained on large-scale human-only motion, serves as a task-agnostic base for generating human-like whole-body movements. An efficient but precise residual policy is then learned to refine the GMT outputs to improve locomotion and incorporate object interaction. To further facilitate efficient training, we design (i) a point-cloud-based object tracking reward for smoother optimization, (ii) a contact reward that encourages accurate humanoid body-object interactions, and (iii) a curriculum-based virtual object controller to stabilize early training. We evaluate ResMimic in both simulation and on a real Unitree G1 humanoid. Results show substantial gains in task success, training efficiency, and robustness over strong baselines. Videos are available at https://resmimic.github.io/ .
Updated: 2025-10-08 06:51:48
标题: ResMimic:从一般运动跟踪到通过残差学习实现人形整体身体运动控制
摘要: 人形机器人的全身定位操控为日常服务和仓储任务提供了变革性的能力。尽管最近在一般运动追踪(GMT)方面取得了进展,使人形机器人能够复制多样化的人类动作,但这些策略缺乏所需的精确性和对象意识,以进行定位操控。为此,我们引入了ResMimic,这是一个从人类运动数据中实现精确和富有表现力的人形机器人控制的两阶段残差学习框架。首先,一个在大规模人类运动数据上训练的GMT策略作为一种任务不可知的基础,用于生成类似人类的全身运动。然后学习一种高效而精确的残差策略,以细化GMT的输出,改善定位运动并整合对象交互。为了进一步促进高效训练,我们设计了(i)基于点云的对象追踪奖励,以实现更顺畅的优化,(ii)一个接触奖励,鼓励准确的人形机器人身体-对象交互,以及(iii)一个基于课程的虚拟对象控制器,以稳定早期训练。我们在仿真和真实的Unitree G1人形机器人上评估了ResMimic。结果显示,在任务成功率、训练效率和鲁棒性方面,与强基线相比取得了实质性的提升。视频可在https://resmimic.github.io/ 上观看。
更新时间: 2025-10-08 06:51:48
领域: cs.RO,cs.LG
A Diffusion Model for Regular Time Series Generation from Irregular Data with Completion and Masking
Generating realistic time series data is critical for applications in healthcare, finance, and science. However, irregular sampling and missing values present significant challenges. While prior methods address these irregularities, they often yield suboptimal results and incur high computational costs. Recent advances in regular time series generation, such as the diffusion-based ImagenTime model, demonstrate strong, fast, and scalable generative capabilities by transforming time series into image representations, making them a promising solution. However, extending ImagenTime to irregular sequences using simple masking introduces "unnatural" neighborhoods, where missing values replaced by zeros disrupt the learning process. To overcome this, we propose a novel two-step framework: first, a Time Series Transformer completes irregular sequences, creating natural neighborhoods; second, a vision-based diffusion model with masking minimizes dependence on the completed values. This approach leverages the strengths of both completion and masking, enabling robust and efficient generation of realistic time series. Our method achieves state-of-the-art performance, achieving a relative improvement in discriminative score by $70\%$ and in computational cost by $85\%$. Code is at https://github.com/azencot-group/ImagenI2R.
Updated: 2025-10-08 06:47:58
标题: 一个用于从不规则数据生成规律时间序列的扩散模型,包括完成和掩模
摘要: 生成逼真的时间序列数据对于医疗保健、金融和科学应用至关重要。然而,不规则采样和缺失值带来了重大挑战。尽管先前的方法解决了这些不规则性,但通常会产生次优结果并造成高计算成本。最近在正则时间序列生成方面取得的进展,例如基于扩散的ImagenTime模型,通过将时间序列转换为图像表示,展示了强大、快速和可扩展的生成能力,使其成为一种有希望的解决方案。然而,将ImagenTime扩展到不规则序列使用简单的蒙版引入了“不自然”的邻域,其中用零替换的缺失值破坏了学习过程。为了克服这一问题,我们提出了一个新颖的两步框架:首先,一个时间序列变换器完成不规则序列,创建自然邻域;其次,一个基于视觉的扩散模型通过蒙版最小化对已完成值的依赖。这种方法利用了完成和蒙版的优势,实现了对逼真时间序列的强大和高效生成。我们的方法实现了最先进的性能,在判别分数上实现了相对提高70%、计算成本提高了85%。源代码在https://github.com/azencot-group/ImagenI2R。
更新时间: 2025-10-08 06:47:58
领域: cs.LG
VT-FSL: Bridging Vision and Text with LLMs for Few-Shot Learning
Few-shot learning (FSL) aims to recognize novel concepts from only a few labeled support samples. Recent studies enhance support features by incorporating additional semantic information or designing complex semantic fusion modules. However, they still suffer from hallucinating semantics that contradict the visual evidence due to the lack of grounding in actual instances, resulting in noisy guidance and costly corrections. To address these issues, we propose a novel framework, bridging Vision and Text with LLMs for Few-Shot Learning (VT-FSL), which constructs precise cross-modal prompts conditioned on Large Language Models (LLMs) and support images, seamlessly integrating them through a geometry-aware alignment. It mainly consists of Cross-modal Iterative Prompting (CIP) and Cross-modal Geometric Alignment (CGA). Specifically, the CIP conditions an LLM on both class names and support images to generate precise class descriptions iteratively in a single structured reasoning pass. These descriptions not only enrich the semantic understanding of novel classes but also enable the zero-shot synthesis of semantically consistent images. The descriptions and synthetic images act respectively as complementary textual and visual prompts, providing high-level class semantics and low-level intra-class diversity to compensate for limited support data. Furthermore, the CGA jointly aligns the fused textual, support, and synthetic visual representations by minimizing the kernelized volume of the 3-dimensional parallelotope they span. It captures global and nonlinear relationships among all representations, enabling structured and consistent multimodal integration. The proposed VT-FSL method establishes new state-of-the-art performance across ten diverse benchmarks, including standard, cross-domain, and fine-grained few-shot learning scenarios. Code is available at https://github.com/peacelwh/VT-FSL.
Updated: 2025-10-08 06:46:28
标题: VT-FSL:利用LLMs桥接视觉和文本进行少样本学习
摘要: Few-shot learning (FSL)旨在仅利用少量标记的支持样本识别新概念。最近的研究通过融合额外的语义信息或设计复杂的语义融合模块来增强支持特征。然而,它们仍然受到虚构语义的困扰,这些语义与视觉证据相矛盾,因为缺乏实例的基础,导致嘈杂的指导和昂贵的纠正。为了解决这些问题,我们提出了一个新颖的框架,通过LLMs(Large Language Models)桥接视觉和文本,用于Few-Shot Learning(VT-FSL),该框架构建了基于LLMs和支持图像的精确跨模态提示,通过几何感知对齐它们。它主要由跨模态迭代提示(CIP)和跨模态几何对齐(CGA)组成。具体来说,CIP将LLM条件设定为类名和支持图像,通过一次结构化推理传递迭代生成精确的类描述。这些描述不仅丰富了对新类的语义理解,还实现了语义一致图像的零样本合成。描述和合成图像分别作为互补的文本和视觉提示,提供高级类语义和低级类内多样性,以弥补有限的支持数据。此外,CGA通过最小化它们跨越的3维平行四边形的核化体积,联合对齐融合的文本、支持和合成视觉表示。它捕捉了所有表示之间的全局和非线性关系,实现了结构化和一致的多模态整合。提出的VT-FSL方法在包括标准、跨域和细粒度Few-shot学习场景在内的十个不同基准上建立了新的最先进性能。代码可在https://github.com/peacelwh/VT-FSL找到。
更新时间: 2025-10-08 06:46:28
领域: cs.CV,cs.LG,I.4.9
Learning to Rewrite Prompts for Bootstrapping LLMs on Downstream Tasks
In recent years, the growing interest in Large Language Models (LLMs) has significantly advanced prompt engineering, transitioning from manual design to model-based optimization. Prompts for LLMs generally comprise two components: the \textit{instruction}, which defines the task or objective, and the \textit{input}, which is tailored to the instruction type. In natural language generation (NLG) tasks such as machine translation, the \textit{input} component is particularly critical, while the \textit{instruction} component tends to be concise. Existing prompt engineering methods primarily focus on optimizing the \textit{instruction} component for general tasks, often requiring large-parameter LLMs as auxiliary tools. However, these approaches exhibit limited applicability for tasks like machine translation, where the \textit{input} component plays a more pivotal role. To address this limitation, this paper introduces a novel prompt optimization method specifically designed for machine translation tasks. The proposed approach employs a small-parameter model trained using a back-translation-based strategy, significantly reducing training overhead for single-task optimization while delivering highly effective performance. With certain adaptations, this method can also be extended to other downstream tasks.
Updated: 2025-10-08 06:40:06
标题: 学习重新编写提示以引导下游任务中的LLM自举
摘要: 近年来,对大型语言模型(LLMs)的兴趣不断增长,显著推进了提示工程,从手动设计过渡到基于模型的优化。LLMs的提示通常由两个组成部分组成:\textit{指示},定义任务或目标,以及\textit{输入},根据指示类型定制。在自然语言生成(NLG)任务中,如机器翻译,\textit{输入}组件尤其关键,而\textit{指示}组件往往是简洁的。现有的提示工程方法主要集中在优化\textit{指示}组件用于一般任务,通常需要大参数LLMs作为辅助工具。然而,这些方法对于机器翻译等任务的适用性有限,其中\textit{输入}组件发挥更关键的作用。为解决这一局限性,本文介绍了一种专门针对机器翻译任务设计的新颖提示优化方法。所提出的方法采用了一个小参数模型,使用基于回译的策略进行训练,显著减少了单任务优化的训练开销,同时提供了高效的性能。通过一些适应性调整,这种方法也可以扩展到其他下游任务中。
更新时间: 2025-10-08 06:40:06
领域: cs.CL,cs.AI,cs.LG,eess.AS
CyberGym: Evaluating AI Agents' Real-World Cybersecurity Capabilities at Scale
AI agents have significant potential to reshape cybersecurity, making a thorough assessment of their capabilities critical. However, existing evaluations fall short, because they are based on small-scale benchmarks and only measure static outcomes, failing to capture the full, dynamic range of real-world security challenges. To address these limitations, we introduce CyberGym, a large-scale benchmark featuring 1,507 real-world vulnerabilities across 188 software projects. Adjustable to different vulnerability analysis settings, CyberGym primarily tasks agents with generating a proof-of-concept test that reproduces a vulnerability, given only its text description and the corresponding codebase. Our extensive evaluation highlights that CyberGym effectively differentiates agents' and models' cybersecurity capabilities. Even the top-performing combinations only achieve a ~20% success rate, demonstrating the overall difficulty of CyberGym. Beyond static benchmarking, we show that CyberGym leads to the discovery of 35 zero-day vulnerabilities and 17 historically incomplete patches. These results underscore that CyberGym is not only a robust benchmark for measuring AI's progress in cybersecurity but also a platform for creating direct, real-world security impact.
Updated: 2025-10-08 06:32:58
标题: 网络体能房:在规模上评估人工智能代理的实际世界网络安全能力
摘要: 人工智能代理在重新塑造网络安全方面具有重大潜力,因此对其能力进行全面评估至关重要。然而,现有的评估存在不足,因为它们基于小规模基准测试,并且仅测量静态结果,未能捕捉到真实世界安全挑战的全面、动态范围。为了解决这些限制,我们引入了CyberGym,一个包含1,507个真实世界漏洞的大规模基准测试,涵盖188个软件项目。CyberGym可根据不同漏洞分析设置进行调整,主要任务是让代理生成一个能够复制漏洞的概念验证测试,仅提供文本描述和相应代码库。我们的广泛评估表明,CyberGym有效区分了代理和模型的网络安全能力。即使是表现最佳的组合也只能取得约20%的成功率,显示出CyberGym的整体难度。除了静态基准测试外,我们还展示CyberGym导致了35个零日漏洞和17个历史上不完整的补丁的发现。这些结果强调了CyberGym不仅是一个用于衡量人工智能在网络安全领域进展的强大基准测试,还是一个能够在直接、真实世界中产生安全影响的平台。
更新时间: 2025-10-08 06:32:58
领域: cs.CR,cs.AI,cs.LG
Is the Hard-Label Cryptanalytic Model Extraction Really Polynomial?
Deep Neural Networks (DNNs) have attracted significant attention, and their internal models are now considered valuable intellectual assets. Extracting these internal models through access to a DNN is conceptually similar to extracting a secret key via oracle access to a block cipher. Consequently, cryptanalytic techniques, particularly differential-like attacks, have been actively explored recently. ReLU-based DNNs are the most commonly and widely deployed architectures. While early works (e.g., Crypto 2020, Eurocrypt 2024) assume access to exact output logits, which are usually invisible, more recent works (e.g., Asiacrypt 2024, Eurocrypt 2025) focus on the hard-label setting, where only the final classification result (e.g., "dog" or "car") is available to the attacker. Notably, Carlini et al. (Eurocrypt 2025) demonstrated that model extraction is feasible in polynomial time even under this restricted setting. In this paper, we first show that the assumptions underlying their attack become increasingly unrealistic as the attack-target depth grows. In practice, satisfying these assumptions requires an exponential number of queries with respect to the attack depth, implying that the attack does not always run in polynomial time. To address this critical limitation, we propose a novel attack method called CrossLayer Extraction. Instead of directly extracting the secret parameters (e.g., weights and biases) of a specific neuron, which incurs exponential cost, we exploit neuron interactions across layers to extract this information from deeper layers. This technique significantly reduces query complexity and mitigates the limitations of existing model extraction approaches.
Updated: 2025-10-08 06:29:36
标题: 硬标签密码分析模型提取是否真的是多项式的?
摘要: 深度神经网络(DNNs)引起了重大关注,它们的内部模型现在被视为有价值的知识产权。通过访问DNN提取这些内部模型在概念上类似于通过对块密码的oracle访问提取秘密密钥。因此,密码分析技术,特别是类似差分攻击的技术最近受到了积极探索。基于ReLU的DNNs是最常见和广泛部署的架构。早期的研究(例如,Crypto 2020,Eurocrypt 2024)假定可以访问精确的输出对数,这些对数通常是不可见的,而最近的研究(例如,Asiacrypt 2024,Eurocrypt 2025)侧重于硬标签设置,即攻击者只能获取最终的分类结果(例如,“狗”或“汽车”)。值得注意的是,Carlini等人(Eurocrypt 2025)证明了即使在这种受限设置下,模型提取也是可行的,可以在多项式时间内完成。 在本文中,我们首先展示了随着攻击目标深度增加,其攻击基础变得越来越不现实。在实践中,满足这些假设需要相对于攻击深度的指数数量的查询,这意味着攻击并不总是在多项式时间内运行。为了解决这一关键限制,我们提出了一种名为CrossLayer Extraction的新型攻击方法。我们不直接提取特定神经元的秘密参数(例如权重和偏置),因为这会导致指数成本,而是利用层间神经元的相互作用从更深层中提取这些信息。这种技术显著降低了查询复杂度,并减轻了现有模型提取方法的限制。
更新时间: 2025-10-08 06:29:36
领域: cs.LG,cs.CR
Latent Representation Learning in Heavy-Ion Collisions with MaskPoint Transformer
A central challenge in high-energy nuclear physics is to extract informative features from the high-dimensional final-state data of heavy-ion collisions (HIC) in order to enable reliable downstream analyses. Traditional approaches often rely on selected observables, which may miss subtle but physically relevant structures in the data. To address this, we introduce a Transformer-based autoencoder trained with a two-stage paradigm: self-supervised pre-training followed by supervised fine-tuning. The pretrained encoder learns latent representations directly from unlabeled HIC data, providing a compact and information-rich feature space that can be adapted to diverse physics tasks. As a case study, we apply the method to distinguish between large and small collision systems, where it achieves significantly higher classification accuracy than PointNet. Principal component analysis and SHAP interpretation further demonstrate that the autoencoder captures complex nonlinear correlations beyond individual observables, yielding features with strong discriminative and explanatory power. These results establish our two-stage framework as a general and robust foundation for feature learning in HIC, opening the door to more powerful analyses of quark--gluon plasma properties and other emergent phenomena. The implementation is publicly available at https://github.com/Giovanni-Sforza/MaskPoint-AMPT.
Updated: 2025-10-08 06:27:10
标题: 在重离子碰撞中的MaskPoint Transformer潜在表示学习
摘要: 在高能核物理学中的一个核心挑战是从重离子碰撞(HIC)的高维最终状态数据中提取有信息量的特征,以便实现可靠的下游分析。传统方法通常依赖于选定的观测量,可能会忽略数据中微妙但物理相关的结构。为了解决这个问题,我们引入了一个基于Transformer的自编码器,通过一个两阶段范式进行训练:自监督预训练,然后是监督微调。预训练编码器直接从未标记的HIC数据中学习潜在表示,提供了一个紧凑且信息丰富的特征空间,可以适应多样的物理任务。作为一个案例研究,我们将该方法应用于区分大型和小型碰撞系统,在这方面,它的分类准确性显著高于PointNet。主成分分析和SHAP解释进一步证明,自编码器捕捉了超出单个观测量的复杂非线性相关性,产生具有强大的区分和解释能力的特征。这些结果将我们的两阶段框架确立为HIC特征学习的通用和稳健基础,为夸克--胶子等离子体特性和其他新兴现象的更强大分析打开了大门。该实现可在https://github.com/Giovanni-Sforza/MaskPoint-AMPT 上公开获取。
更新时间: 2025-10-08 06:27:10
领域: hep-ph,cs.LG
Enhancing GraphQL Security by Detecting Malicious Queries Using Large Language Models, Sentence Transformers, and Convolutional Neural Networks
GraphQL's flexibility, while beneficial for efficient data fetching, introduces unique security vulnerabilities that traditional API security mechanisms often fail to address. Malicious GraphQL queries can exploit the language's dynamic nature, leading to denial-of-service attacks, data exfiltration through injection, and other exploits. Existing solutions, such as static analysis, rate limiting, and general-purpose Web Application Firewalls, offer limited protection against sophisticated, context-aware attacks. This paper presents a novel, AI-driven approach for real-time detection of malicious GraphQL queries. Our method combines static analysis with machine learning techniques, including Large Language Models (LLMs) for dynamic schema-based configuration, Sentence Transformers (SBERT and Doc2Vec) for contextual embedding of query payloads, and Convolutional Neural Networks (CNNs), Random Forests, and Multilayer Perceptrons for classification. We detail the system architecture, implementation strategies optimized for production environments (including ONNX Runtime optimization and parallel processing), and evaluate the performance of our detection models and the overall system under load. Results demonstrate high accuracy in detecting various threats, including SQL injection, OS command injection, and XSS exploits, alongside effective mitigation of DoS and SSRF attempts. This research contributes a robust and adaptable solution for enhancing GraphQL API security.
Updated: 2025-10-08 06:22:30
标题: 利用大型语言模型、句子转换器和卷积神经网络检测恶意查询,增强GraphQL安全性
摘要: GraphQL的灵活性,虽然有助于高效地获取数据,但也引入了传统API安全机制往往无法解决的独特安全漏洞。恶意的GraphQL查询可以利用语言的动态特性,导致拒绝服务攻击、通过注入实现数据窃取和其他利用行为。现有解决方案,如静态分析、速率限制和通用Web应用程序防火墙,对复杂、上下文感知的攻击提供了有限的保护。本文提出了一种新颖的、基于人工智能的实时检测恶意GraphQL查询的方法。我们的方法结合了静态分析和机器学习技术,包括基于动态模式的大型语言模型(LLMs)配置,用于查询负载的上下文嵌入的Sentence Transformers(SBERT和Doc2Vec),以及用于分类的卷积神经网络(CNNs)、随机森林和多层感知器。我们详细介绍了系统架构、针对生产环境优化的实现策略(包括ONNX Runtime优化和并行处理),并评估了我们的检测模型和整个系统在负载下的性能。结果表明,在检测各种威胁(包括SQL注入、操作系统命令注入和XSS利用)方面具有高准确性,同时有效地缓解了DoS和SSRF尝试。这项研究为增强GraphQL API安全性提供了强大而适应性强的解决方案。
更新时间: 2025-10-08 06:22:30
领域: cs.CR,cs.AI,cs.LG
Gaussian Equivalence for Self-Attention: Asymptotic Spectral Analysis of Attention Matrix
Self-attention layers have become fundamental building blocks of modern deep neural networks, yet their theoretical understanding remains limited, particularly from the perspective of random matrix theory. In this work, we provide a rigorous analysis of the singular value spectrum of the attention matrix and establish the first Gaussian equivalence result for attention. In a natural regime where the inverse temperature remains of constant order, we show that the singular value distribution of the attention matrix is asymptotically characterized by a tractable linear model. We further demonstrate that the distribution of squared singular values deviates from the Marchenko-Pastur law, which has been believed in previous work. Our proof relies on two key ingredients: precise control of fluctuations in the normalization term and a refined linearization that leverages favorable Taylor expansions of the exponential. This analysis also identifies a threshold for linearization and elucidates why attention, despite not being an entrywise operation, admits a rigorous Gaussian equivalence in this regime.
Updated: 2025-10-08 06:13:42
标题: 自注意力的高斯等效性:注意力矩阵的渐近谱分析
摘要: 自我注意力层已经成为现代深度神经网络的基本构建模块,然而它们的理论理解仍然有限,特别是从随机矩阵理论的角度来看。在这项工作中,我们对注意力矩阵的奇异值谱进行了严格分析,并建立了注意力的第一个高斯等价结果。在一个自然的范围内,其中逆温度保持恒定顺序,我们展示了注意力矩阵的奇异值分布在渐近上由一个易于处理的线性模型表征。我们进一步证明了奇异值的平方分布偏离了以前研究中所认为的马尔琴科-帕斯图尔定律。我们的证明依赖于两个关键要素:对归一化项中波动的精确控制和利用指数的有利泰勒展开的精细线性化。这个分析还确定了线性化的阈值,并阐明了为什么在这个范围内,尽管不是逐个元素操作,但注意力也能够具有严格的高斯等价性。
更新时间: 2025-10-08 06:13:42
领域: stat.ML,cs.LG,math.PR
AutoBalance: An Automatic Balancing Framework for Training Physics-Informed Neural Networks
Physics-Informed Neural Networks (PINNs) provide a powerful and general framework for solving Partial Differential Equations (PDEs) by embedding physical laws into loss functions. However, training PINNs is notoriously difficult due to the need to balance multiple loss terms, such as PDE residuals and boundary conditions, which often have conflicting objectives and vastly different curvatures. Existing methods address this issue by manipulating gradients before optimization (a "pre-combine" strategy). We argue that this approach is fundamentally limited, as forcing a single optimizer to process gradients from spectrally heterogeneous loss landscapes disrupts its internal preconditioning. In this work, we introduce AutoBalance, a novel "post-combine" training paradigm. AutoBalance assigns an independent adaptive optimizer to each loss component and aggregates the resulting preconditioned updates afterwards. Extensive experiments on challenging PDE benchmarks show that AutoBalance consistently outperforms existing frameworks, achieving significant reductions in solution error, as measured by both the MSE and $L^{\infty}$ norms. Moreover, AutoBalance is orthogonal to and complementary with other popular PINN methodologies, amplifying their effectiveness on demanding benchmarks.
Updated: 2025-10-08 06:13:03
标题: AutoBalance:用于训练物理信息神经网络的自动平衡框架
摘要: 物理信息神经网络(PINNs)提供了一个强大且通用的框架,通过将物理定律嵌入损失函数来解决偏微分方程(PDEs)。然而,由于需要平衡多个损失项,如PDE残差和边界条件,这些损失项通常具有冲突的目标和迥异的曲率,因此训练PINNs是非常困难的。现有方法通过在优化之前操纵梯度(一种“预合并”策略)来解决这个问题。我们认为这种方法在根本上存在局限性,因为强迫单一优化器处理来自光谱异质损失景观的梯度会破坏其内部预处理。在这项工作中,我们引入了AutoBalance,一种新颖的“后合并”训练范式。AutoBalance为每个损失组件分配独立的自适应优化器,并在之后聚合产生的预处理更新。对具有挑战性的PDE基准测试进行的大量实验表明,AutoBalance始终优于现有框架,在MSE和$L^{\infty}$范数两方面实现了显著的解决方案误差降低。此外,AutoBalance与其他流行的PINN方法正交且互补,加强了它们在严格基准测试上的有效性。
更新时间: 2025-10-08 06:13:03
领域: cs.LG,cs.NA,math.NA,math.OC
Distributed Algorithms for Multi-Agent Multi-Armed Bandits with Collision
We study the stochastic Multiplayer Multi-Armed Bandit (MMAB) problem, where multiple players select arms to maximize their cumulative rewards. Collisions occur when two or more players select the same arm, resulting in no reward, and are observed by the players involved. We consider a distributed setting without central coordination, where each player can only observe their own actions and collision feedback. We propose a distributed algorithm with an adaptive, efficient communication protocol. The algorithm achieves near-optimal group and individual regret, with a communication cost of only $\mathcal{O}(\log\log T)$. Our experiments demonstrate significant performance improvements over existing baselines. Compared to state-of-the-art (SOTA) methods, our approach achieves a notable reduction in individual regret. Finally, we extend our approach to a periodic asynchronous setting, proving the lower bound for this problem and presenting an algorithm that achieves logarithmic regret.
Updated: 2025-10-08 06:12:59
标题: 多智能体多臂赌博机碰撞的分布式算法
摘要: 我们研究了随机多人多臂赌博机(MMAB)问题,其中多个玩家选择臂以最大化其累积奖励。当两个或更多玩家选择相同的臂时,就会发生冲突,导致没有奖励,并由涉及的玩家观察到。我们考虑一个没有中央协调的分布式设置,每个玩家只能观察自己的行为和碰撞反馈。我们提出了一个具有自适应、高效通信协议的分布式算法。该算法实现了近乎最优的组和个体遗憾,通信成本仅为$\mathcal{O}(\log\log T)$。我们的实验表明,与现有基准线相比,性能有显著改进。与最先进的方法相比,我们的方法在个体遗憾方面实现了显著减少。最后,我们将我们的方法扩展到周期性异步设置,证明了这个问题的下限并提出了一个实现对数遗憾的算法。
更新时间: 2025-10-08 06:12:59
领域: cs.LG
TimeFormer: Transformer with Attention Modulation Empowered by Temporal Characteristics for Time Series Forecasting
Although Transformers excel in natural language processing, their extension to time series forecasting remains challenging due to insufficient consideration of the differences between textual and temporal modalities. In this paper, we develop a novel Transformer architecture designed for time series data, aiming to maximize its representational capacity. We identify two key but often overlooked characteristics of time series: (1) unidirectional influence from the past to the future, and (2) the phenomenon of decaying influence over time. These characteristics are introduced to enhance the attention mechanism of Transformers. We propose TimeFormer, whose core innovation is a self-attention mechanism with two modulation terms (MoSA), designed to capture these temporal priors of time series under the constraints of the Hawkes process and causal masking. Additionally, TimeFormer introduces a framework based on multi-scale and subsequence analysis to capture semantic dependencies at different temporal scales, enriching the temporal dependencies. Extensive experiments conducted on multiple real-world datasets show that TimeFormer significantly outperforms state-of-the-art methods, achieving up to a 7.45% reduction in MSE compared to the best baseline and setting new benchmarks on 94.04\% of evaluation metrics. Moreover, we demonstrate that the MoSA mechanism can be broadly applied to enhance the performance of other Transformer-based models.
Updated: 2025-10-08 06:07:30
标题: TimeFormer: 利用时间特性增强的带有注意力调节的Transformer用于时间序列预测
摘要: 尽管Transformer在自然语言处理方面表现出色,但由于对文本和时间序列模态之间的差异考虑不足,其在时间序列预测方面的拓展仍然具有挑战性。本文提出了一种专为时间序列数据设计的新型Transformer架构,旨在最大化其表示能力。我们确定了时间序列的两个关键但常被忽视的特征:(1)过去到未来的单向影响,以及(2)随时间衰减的影响现象。这些特征被引入以增强Transformer的注意机制。我们提出了TimeFormer,其核心创新是具有两个调制项(MoSA)的自注意机制,旨在在Hawkes过程和因果屏蔽的约束下捕捉时间序列的这些时间先验。此外,TimeFormer引入了基于多尺度和子序列分析的框架,以捕捉不同时间尺度上的语义依赖关系,丰富时间依赖性。在多个真实世界数据集上进行的大量实验表明,TimeFormer明显优于最先进的方法,相对于最佳基准线,MSE降低了高达7.45%,并在94.04%的评估指标上设定了新的基准。此外,我们证明MoSA机制可以广泛应用于增强其他基于Transformer的模型的性能。
更新时间: 2025-10-08 06:07:30
领域: cs.LG
Incremental Summarization for Customer Support via Progressive Note-Taking and Agent Feedback
We introduce an incremental summarization system for customer support agents that intelligently determines when to generate concise bullet notes during conversations, reducing agents' context-switching effort and redundant review. Our approach combines a fine-tuned Mixtral-8x7B model for continuous note generation with a DeBERTa-based classifier to filter trivial content. Agent edits refine the online notes generation and regularly inform offline model retraining, closing the agent edits feedback loop. Deployed in production, our system achieved a 3% reduction in case handling time compared to bulk summarization (with reductions of up to 9% in highly complex cases), alongside high agent satisfaction ratings from surveys. These results demonstrate that incremental summarization with continuous feedback effectively enhances summary quality and agent productivity at scale.
Updated: 2025-10-08 06:05:58
标题: 持续性笔记记录和代理人反馈的增量式客户支持摘要
摘要: 我们引入了一个逐步摘要系统,用于客服代理人智能地确定何时在对话过程中生成简洁的要点笔记,减少代理人的上下文切换工作量和冗余审查。我们的方法结合了一个经过精细调整的Mixtral-8x7B模型用于连续笔记生成,以及一个基于DeBERTa的分类器来过滤琐碎内容。代理人的编辑会完善在线笔记生成,并定期通知离线模型重新训练,闭合代理人编辑反馈循环。在生产环境中部署后,与批量摘要相比,我们的系统实现了案件处理时间减少3%的效果(在高度复杂的案例中可降低高达9%),同时在调查中获得了高满意度的代理人评分。这些结果表明,具有连续反馈的逐步摘要系统有效地提升了总结质量并提高了规模化的代理人生产力。
更新时间: 2025-10-08 06:05:58
领域: cs.CL,cs.AI,cs.LG
XRPO: Pushing the limits of GRPO with Targeted Exploration and Exploitation
Reinforcement learning algorithms such as GRPO have driven recent advances in large language model (LLM) reasoning. While scaling the number of rollouts stabilizes training, existing approaches suffer from limited exploration on challenging prompts and leave informative feedback signals underexploited, due to context-independent rollout allocation across prompts (e.g., generating 16 rollouts per prompt) and relying heavily on sparse rewards. This paper presents XRPO(eXplore - eXploit GRPO), a unified framework that recasts policy optimization through the principled lens of rollout exploration-exploitation. To enhance exploration, XRPO introduces a mathematically grounded rollout allocator that adaptively prioritizes prompts with higher potential for uncertainty reduction. It further addresses stagnation on zero-reward prompts through an in-context seeding strategy that injects curated exemplars, steering the model into more difficult reasoning trajectories. To strengthen exploitation, XRPO develops a group-relative, novelty-aware advantage sharpening mechanism that leverages sequence likelihoods to amplify low-probability yet correct responses, thereby extending the policy's reach beyond sparse rewards. Experiments across diverse math and coding benchmarks on both reasoning and non-reasoning models demonstrate that XRPO outperforms existing advances (e.g., GRPO and GSPO) up to 4% pass@1 and 6% cons@32, while accelerating training convergence by up to 2.7X.
Updated: 2025-10-08 05:53:56
标题: XRPO:通过定向探索和利用推动GRPO的极限
摘要: 强化学习算法,如GRPO,推动了最近在大型语言模型(LLM)推理领域的进展。尽管增加模拟次数可以稳定训练,但现有方法在具有挑战性的提示上存在有限的探索,并未充分利用信息反馈信号,这是由于跨提示的基于上下文独立的模拟分配(例如,为每个提示生成16次模拟)并且过度依赖稀疏奖励所导致的。本文介绍了XRPO(eXplore - eXploit GRPO),这是一个统一的框架,通过模拟探索-利用的原则性视角重新构建策略优化。为了增强探索,XRPO引入了一个数学基础的模拟分配器,自适应地优先考虑具有更高潜力的提示以减少不确定性。它通过一种上下文中的种子策略来解决零奖励提示上的停滞现象,该策略注入了经过策划的示例,引导模型进入更具挑战性的推理轨迹。为了加强利用,XRPO开发了一种组相对的、新颖感知的优势加强机制,利用序列的可能性来放大低概率但正确的响应,从而扩展策略的范围超出稀疏奖励。在各种数学和编码基准测试上的实验中,无论是推理模型还是非推理模型,XRPO都表现出比现有进展(例如GRPO和GSPO)高达4% pass@1和6% cons@32的性能,同时将训练收敛加速了最多2.7倍。
更新时间: 2025-10-08 05:53:56
领域: cs.LG
The Effect of Attention Head Count on Transformer Approximation
Transformer has become the dominant architecture for sequence modeling, yet a detailed understanding of how its structural parameters influence expressive power remains limited. In this work, we study the approximation properties of transformers, with particular emphasis on the role of the number of attention heads. Our analysis begins with the introduction of a generalized $D$-retrieval task, which we prove to be dense in the space of continuous functions, thereby providing the basis for our theoretical framework. We then establish both upper and lower bounds on the parameter complexity required for $\epsilon$-approximation. Specifically, we show that transformers with sufficiently many heads admit efficient approximation, whereas with too few heads, the number of parameters must scale at least as $O(1/\epsilon^{cT})$, for some constant $c$ and sequence length $T$. To the best of our knowledge, this constitutes the first rigorous lower bound of this type in a nonlinear and practically relevant setting. We further examine the single-head case and demonstrate that an embedding dimension of order $O(T)$ allows complete memorization of the input, where approximation is entirely achieved by the feed-forward block. Finally, we validate our theoretical findings with experiments on both synthetic data and real-world tasks, illustrating the practical relevance of our results.
Updated: 2025-10-08 05:27:25
标题: 关注头数对Transformer近似的影响
摘要: Transformer已经成为序列建模的主导架构,然而对其结构参数如何影响表达能力的详细理解仍然有限。在这项工作中,我们研究了transformer的逼近特性,特别强调了注意力头的数量的作用。我们的分析始于引入一个广义的D-检索任务,我们证明这个任务在连续函数空间中是密集的,从而为我们的理论框架提供了基础。然后我们建立了关于参数复杂度的上下界,用于ε-逼近。具体而言,我们展示了具有足够多头的transformer可以实现高效逼近,而头部太少时,参数数量必须至少按照O(1/ε^{cT})的速度增长,其中c是一个常数,T是序列长度。据我们所知,这构成了在非线性和实际相关设置中的第一个严格的下界。我们进一步研究了单头情况,并展示了一个与T同阶的嵌入维度可以完全记忆输入,逼近完全由前向传播块实现。最后,我们通过在合成数据和真实任务上的实验证实了我们的理论发现,展示了我们结果的实际相关性。
更新时间: 2025-10-08 05:27:25
领域: cs.LG,stat.ML
Rethinking Nonlinearity: Trainable Gaussian Mixture Modules for Modern Neural Architectures
Neural networks in general, from MLPs and CNNs to attention-based Transformers, are constructed from layers of linear combinations followed by nonlinear operations such as ReLU, Sigmoid, or Softmax. Despite their strength, these conventional designs are often limited in introducing non-linearity by the choice of activation functions. In this work, we introduce Gaussian Mixture-Inspired Nonlinear Modules (GMNM), a new class of differentiable modules that draw on the universal density approximation Gaussian mixture models (GMMs) and distance properties (metric space) of Gaussian kernal. By relaxing probabilistic constraints and adopting a flexible parameterization of Gaussian projections, GMNM can be seamlessly integrated into diverse neural architectures and trained end-to-end with gradient-based methods. Our experiments demonstrate that incorporating GMNM into architectures such as MLPs, CNNs, attention mechanisms, and LSTMs consistently improves performance over standard baselines. These results highlight GMNM's potential as a powerful and flexible module for enhancing efficiency and accuracy across a wide range of machine learning applications.
Updated: 2025-10-08 05:20:34
标题: 重新思考非线性:现代神经结构的可训练高斯混合模块
摘要: 总的来说,从多层感知器(MLPs)和卷积神经网络(CNNs)到基于注意力的Transformer,神经网络通常由线性组合层和接着的非线性操作(如ReLU、Sigmoid或Softmax)构成。尽管它们的强大,这些传统设计通常受到激活函数选择的限制,限制了引入非线性的能力。在这项工作中,我们引入了受高斯混合模型启发的非线性模块(GMNM),这是一种新的可微分模块类别,借鉴了通用密度逼近高斯混合模型(GMMs)和高斯核距离特性(度量空间)。通过放宽概率约束并采用对高斯投影的灵活参数化,GMNM可以无缝地集成到各种神经网络架构中,并通过基于梯度的方法进行端到端训练。我们的实验表明,将GMNM纳入到MLPs、CNNs、注意机制和LSTMs等架构中,始终优于标准基线的性能。这些结果突显了GMNM作为一个强大而灵活的模块,可以提高各种机器学习应用的效率和准确性。
更新时间: 2025-10-08 05:20:34
领域: cs.LG,math.PR
Fitzpatrick Thresholding for Skin Image Segmentation
Accurate estimation of the body surface area (BSA) involved by a rash, such as psoriasis, is critical for assessing rash severity, selecting an initial treatment regimen, and following clinical treatment response. Attempts at segmentation of inflammatory skin disease such as psoriasis perform markedly worse on darker skin tones, potentially impeding equitable care. We assembled a psoriasis dataset sourced from six public atlases, annotated for Fitzpatrick skin type, and added detailed segmentation masks for every image. Reference models based on U-Net, ResU-Net, and SETR-small are trained without tone information. On the tuning split we sweep decision thresholds and select (i) global optima and (ii) per Fitzpatrick skin tone optima for Dice and binary IoU. Adapting Fitzpatrick specific thresholds lifted segmentation performance for the darkest subgroup (Fitz VI) by up to +31 % bIoU and +24 % Dice on UNet, with consistent, though smaller, gains in the same direction for ResU-Net (+25 % bIoU, +18 % Dice) and SETR-small (+17 % bIoU, +11 % Dice). Because Fitzpatrick skin tone classifiers trained on Fitzpatrick-17k now exceed 95 % accuracy, the cost of skin tone labeling required for this technique has fallen dramatically. Fitzpatrick thresholding is simple, model-agnostic, requires no architectural changes, no re-training, and is virtually cost free. We demonstrate the inclusion of Fitzpatrick thresholding as a potential future fairness baseline.
Updated: 2025-10-08 05:15:49
标题: Fitzpatrick阈值法用于皮肤图像分割
摘要: 准确估计身体表面积(BSA)受累的皮肤疹,如银屑病,对于评估疹病严重程度、选择初始治疗方案和跟踪临床治疗反应至关重要。试图对炎症性皮肤病(如银屑病)进行分割在较深色皮肤上表现明显较差,可能阻碍公平护理。我们从六个公共图谱中收集了一组银屑病数据集,为每个图像注释了Fitzpatrick皮肤类型,并添加了详细的分割掩模。基于U-Net、ResU-Net和SETR-small的参考模型在没有色调信息的情况下进行训练。在调整分割时,我们扫描决策阈值,并选择(i)全局最优解和(ii)每个Fitzpatrick皮肤色调的最优解,以Dice和二进制IoU为指标。针对最黑暗的子组(Fitz VI),调整Fitzpatrick特定阈值可使UNet的分割性能提高最多+31%bIoU和+24%Dice,对于ResU-Net(+25%bIoU,+18%Dice)和SETR-small(+17%bIoU,+11%Dice),也在相同方向上获得了一致的、虽然较小的增益。由于在Fitzpatrick-17k上训练的Fitzpatrick皮肤色调分类器现在达到了95%以上的准确性,因此这种技术所需的肤色标记成本大幅降低。Fitzpatrick阈值处理简单,与模型无关,不需要架构更改,无需重新训练,几乎零成本。我们展示了将Fitzpatrick阈值处理作为潜在未来公平基准的可能性。
更新时间: 2025-10-08 05:15:49
领域: eess.IV,cs.LG,I.4.6; I.2.10; J.3
Securing WiFi Fingerprint-based Indoor Localization Systems from Malicious Access Points
WiFi fingerprint-based indoor localization schemes deliver highly accurate location data by matching the received signal strength indicator (RSSI) with an offline database using machine learning (ML) or deep learning (DL) models. However, over time, RSSI values degrade due to the malicious behavior of access points (APs), causing low positional accuracy due to RSSI value mismatch with the offline database. Existing literature lacks the detection of malicious APs in the online phase and mitigating their effects. This research addresses these limitations and proposes a long-term, reliable indoor localization scheme by incorporating malicious AP detection and their effect mitigation techniques. The proposed scheme uses a Light Gradient-Boosting Machine (LGBM) classifier to estimate locations and integrates simple yet efficient techniques to detect malicious APs based on online query data. Subsequently, a mitigation technique is incorporated that updates the offline database and online queries by imputing stable values for malicious APs using LGBM Regressors. Additionally, we introduce a noise addition mechanism in the offline database to capture the dynamic environmental effects. Extensive experimental evaluation shows that the proposed scheme attains a detection accuracy above 95% for each attack type. The mitigation strategy effectively restores the system's performance nearly to its original state when no malicious AP is present. The noise addition module reduces localization errors by nearly 16%. Furthermore, the proposed solution is lightweight, reducing the execution time by approximately 94% compared to the existing methods.
Updated: 2025-10-08 05:09:02
标题: 保护WiFi指纹室内定位系统免受恶意接入点的影响
摘要: 基于WiFi指纹的室内定位方案通过将接收信号强度指示器(RSSI)与离线数据库进行匹配,利用机器学习(ML)或深度学习(DL)模型提供高度精确的位置数据。然而,随着时间的推移,由于访问点(AP)的恶意行为,RSSI值会退化,导致与离线数据库不匹配而造成位置准确度降低。现有文献缺乏在线阶段对恶意AP的检测和减轻其影响。本研究解决了这些限制,并提出了一种长期可靠的室内定位方案,通过整合恶意AP检测和影响减轻技术。所提出的方案使用轻量级梯度增强机(LGBM)分类器来估计位置,并集成简单但高效的技术,基于在线查询数据检测恶意AP。随后,采用一种减轻技术,通过使用LGBM回归器为恶意AP输入稳定值来更新离线数据库和在线查询。此外,我们在离线数据库中引入了噪声添加机制,以捕捉动态环境效应。广泛的实验评估显示,所提出的方案对每种攻击类型的检测准确率均达到95%以上。减轻策略有效地将系统的性能恢复到无恶意AP时的几乎原始状态。噪声添加模块将定位误差减少了近16%。此外,与现有方法相比,所提出的解决方案轻量级,将执行时间减少约94%。
更新时间: 2025-10-08 05:09:02
领域: cs.CR
Local Reinforcement Learning with Action-Conditioned Root Mean Squared Q-Functions
The Forward-Forward (FF) Algorithm is a recently proposed learning procedure for neural networks that employs two forward passes instead of the traditional forward and backward passes used in backpropagation. However, FF remains largely confined to supervised settings, leaving a gap at domains where learning signals can be yielded more naturally such as RL. In this work, inspired by FF's goodness function using layer activity statistics, we introduce Action-conditioned Root mean squared Q-Functions (ARQ), a novel value estimation method that applies a goodness function and action conditioning for local RL using temporal difference learning. Despite its simplicity and biological grounding, our approach achieves superior performance compared to state-of-the-art local backprop-free RL methods in the MinAtar and the DeepMind Control Suite benchmarks, while also outperforming algorithms trained with backpropagation on most tasks. Code can be found at https://github.com/agentic-learning-ai-lab/arq.
Updated: 2025-10-08 05:06:09
标题: 使用动作条件的均方根Q函数进行本地强化学习
摘要: 前向-前向(FF)算法是一种最近提出的神经网络学习过程,它使用两个前向传递而不是传统的反向传递和反向传递。然而,FF主要局限于监督设置,使得在学习信号可以更自然地产生的领域,如RL,存在一定的差距。在这项工作中,受FF使用层活动统计的好处函数的启发,我们引入了一种新颖的值估计方法,称为动作条件的均方根Q函数(ARQ),该方法应用了好处函数和动作条件来使用时间差异学习进行本地RL。尽管我们的方法简单且具有生物学基础,但在MinAtar和DeepMind控制套件基准测试中,我们的方法表现出优越性能,同时也优于大多数任务上使用反向传播训练的算法。代码可以在 https://github.com/agentic-learning-ai-lab/arq 找到。
更新时间: 2025-10-08 05:06:09
领域: cs.LG,cs.AI
Robot Learning from Any Images
We introduce RoLA, a framework that transforms any in-the-wild image into an interactive, physics-enabled robotic environment. Unlike previous methods, RoLA operates directly on a single image without requiring additional hardware or digital assets. Our framework democratizes robotic data generation by producing massive visuomotor robotic demonstrations within minutes from a wide range of image sources, including camera captures, robotic datasets, and Internet images. At its core, our approach combines a novel method for single-view physical scene recovery with an efficient visual blending strategy for photorealistic data collection. We demonstrate RoLA's versatility across applications like scalable robotic data generation and augmentation, robot learning from Internet images, and single-image real-to-sim-to-real systems for manipulators and humanoids. Video results are available at https://sihengz02.github.io/RoLA .
Updated: 2025-10-08 05:05:48
标题: 机器人从任意图像中学习
摘要: 我们介绍了RoLA,这是一个将任何野外图像转化为交互式、启用物理的机器人环境的框架。与先前的方法不同,RoLA直接在单个图像上运行,无需额外的硬件或数字资产。我们的框架通过从各种图像来源,包括相机捕捉、机器人数据集和互联网图像,生成大量视觉运动机器人演示,从而使机器人数据生成民主化。在其核心,我们的方法结合了一种新颖的单视图物理场景恢复方法和一种高效的视觉融合策略,用于逼真数据收集。我们展示了RoLA在可伸缩机器人数据生成和增强、机器人学习来自互联网图像,以及用于机械手和人形机器人的单图像真实-虚拟-真实系统等应用中的多功能性。视频结果可在https://sihengz02.github.io/RoLA 上查看。
更新时间: 2025-10-08 05:05:48
领域: cs.RO,cs.CV,cs.LG
Q-Learning with Fine-Grained Gap-Dependent Regret
We study fine-grained gap-dependent regret bounds for model-free reinforcement learning in episodic tabular Markov Decision Processes. Existing model-free algorithms achieve minimax worst-case regret, but their gap-dependent bounds remain coarse and fail to fully capture the structure of suboptimality gaps. We address this limitation by establishing fine-grained gap-dependent regret bounds for both UCB-based and non-UCB-based algorithms. In the UCB-based setting, we develop a novel analytical framework that explicitly separates the analysis of optimal and suboptimal state-action pairs, yielding the first fine-grained regret upper bound for UCB-Hoeffding (Jin et al., 2018). To highlight the generality of this framework, we introduce ULCB-Hoeffding, a new UCB-based algorithm inspired by AMB (Xu et al.,2021) but with a simplified structure, which enjoys fine-grained regret guarantees and empirically outperforms AMB. In the non-UCB-based setting, we revisit the only known algorithm AMB, and identify two key issues in its algorithm design and analysis: improper truncation in the $Q$-updates and violation of the martingale difference condition in its concentration argument. We propose a refined version of AMB that addresses these issues, establishing the first rigorous fine-grained gap-dependent regret for a non-UCB-based method, with experiments demonstrating improved performance over AMB.
Updated: 2025-10-08 05:02:16
标题: Q-学习与细粒度的间隔相关遗憾
摘要: 我们研究了在基于模型的无模型强化学习中的细粒度依赖间隙的遗憾界,在片段性的表格马尔可夫决策过程中。现有的基于模型的算法实现了极小化最坏情况下的遗憾,但它们的依赖间隙界限仍然粗糙,未能完全捕捉到次优性差距的结构。我们通过建立细粒度依赖间隙的遗憾界限来解决这一限制,其中包括基于UCB和非UCB的算法。在基于UCB的设置中,我们开发了一个新颖的分析框架,明确地分离了对最优和次优状态-动作对的分析,得到了UCB-Hoeffding的首个细粒度遗憾上界。为了突显这一框架的普适性,我们引入了ULCB-Hoeffding,这是一种新的基于UCB的算法,受到AMB的启发,但具有简化的结构,它具有细粒度遗憾保证,并且在实验中表现优于AMB。在非UCB的设置中,我们重新审视了唯一已知的算法AMB,并确定了其算法设计和分析中的两个关键问题:$Q$更新中的不当截断以及在其集中论证中违反了鞅差条件。我们提出了AMB的改进版本,解决了这些问题,为非UCB方法建立了首个严格的细粒度依赖间隙遗憾,实验表明其性能优于AMB。
更新时间: 2025-10-08 05:02:16
领域: stat.ML,cs.LG
The False Promise of Zero-Shot Super-Resolution in Machine-Learned Operators
A core challenge in scientific machine learning, and scientific computing more generally, is modeling continuous phenomena which (in practice) are represented discretely. Machine-learned operators (MLOs) have been introduced as a means to achieve this modeling goal, as this class of architecture can perform inference at arbitrary resolution. In this work, we evaluate whether this architectural innovation is sufficient to perform "zero-shot super-resolution," namely to enable a model to serve inference on higher-resolution data than that on which it was originally trained. We comprehensively evaluate both zero-shot sub-resolution and super-resolution (i.e., multi-resolution) inference in MLOs. We decouple multi-resolution inference into two key behaviors: 1) extrapolation to varying frequency information; and 2) interpolating across varying resolutions. We empirically demonstrate that MLOs fail to do both of these tasks in a zero-shot manner. Consequently, we find MLOs are not able to perform accurate inference at resolutions different from those on which they were trained, and instead they are brittle and susceptible to aliasing. To address these failure modes, we propose a simple, computationally-efficient, and data-driven multi-resolution training protocol that overcomes aliasing and that provides robust multi-resolution generalization.
Updated: 2025-10-08 04:59:56
标题: 机器学习算子中零-shot 超分辨率的虚假承诺
摘要: 在科学机器学习和科学计算中的一个核心挑战是对连续现象进行建模,而这些现象在实践中是以离散方式表示的。机器学习操作符(MLOs)被引入作为实现这一建模目标的手段,因为这类架构可以在任意分辨率下执行推理。在这项工作中,我们评估了这种架构创新是否足以实现“零样本超分辨率”,即使模型能够在比其原始训练数据更高分辨率上进行推理。我们全面评估了MLO中的零样本子分辨率和超分辨率(即多分辨率)推理。我们将多分辨率推理分解为两个关键行为:1)对不同频率信息进行外推;2)在不同分辨率之间进行插值。我们实验证明,MLOs无法以零样本方式执行这两项任务。因此,我们发现MLOs无法在不同于其训练分辨率的分辨率上执行准确的推理,而是脆弱且容易出现混叠。为了解决这些失败模式,我们提出了一个简单、计算高效且数据驱动的多分辨率训练协议,该协议克服了混叠问题并提供了强大的多分辨率泛化能力。
更新时间: 2025-10-08 04:59:56
领域: cs.LG,cs.AI,cs.CV
ProCut: LLM Prompt Compression via Attribution Estimation
In large-scale industrial LLM systems, prompt templates often expand to thousands of tokens as teams iteratively incorporate sections such as task instructions, few-shot examples, and heuristic rules to enhance robustness and coverage. This expansion leads to bloated prompts that are difficult to maintain and incur significant inference latency and serving costs. To address this, we introduce Prompt Compression via Attribution Estimation (ProCut), a flexible, LLM-agnostic, training-free framework that compresses prompts through attribution analysis. ProCut segments prompt templates into semantically meaningful units, quantifies their impact on task performance, and prunes low-utility components. Through extensive experiments on five public benchmark datasets and real-world industrial prompts, we show that ProCut achieves substantial prompt size reductions (78% fewer tokens in production) while maintaining or even slightly improving task performance (up to 62% better than alternative methods). We further introduce an LLM-driven attribution estimator that reduces compression latency by over 50%, and demonstrate that ProCut integrates seamlessly with existing prompt-optimization frameworks to produce concise, high-performing prompts.
Updated: 2025-10-08 04:59:55
标题: ProCut:通过属性估计进行LLM提示压缩
摘要: 在大规模工业LLM系统中,及时模板通常会扩展到数千个标记,团队会迭代地将任务说明、少量示例和启发式规则等部分纳入其中,以增强鲁棒性和覆盖范围。这种扩展导致了臃肿的提示,难以维护,并产生显著的推理延迟和服务成本。为了解决这个问题,我们引入了通过归因估计进行提示压缩(ProCut)的灵活的、与LLM无关的、无需训练的框架,通过归因分析压缩提示。ProCut将提示模板分割成语义上有意义的单元,量化它们对任务性能的影响,并修剪低效率的组件。通过对五个公共基准数据集和实际工业提示的大量实验,我们展示了ProCut实现了显著的提示大小减小(生产中减少了78%的标记),同时保持或甚至略微改善了任务性能(比其他方法高达62%)。我们进一步引入了一种由LLM驱动的归因估计器,将压缩延迟减少了50%以上,并证明ProCut与现有的提示优化框架无缝集成,可以生成简洁、高性能的提示。
更新时间: 2025-10-08 04:59:55
领域: cs.CL,cs.LG
Distilling Lightweight Language Models for C/C++ Vulnerabilities
The increasing complexity of modern software systems exacerbates the prevalence of security vulnerabilities, posing risks of severe breaches and substantial economic loss. Consequently, robust code vulnerability detection is essential for software security. While Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language processing, their potential for automated code vulnerability detection remains underexplored. This paper presents FineSec, a novel framework that harnesses LLMs through knowledge distillation to enable efficient and precise vulnerability identification in C/C++ codebases. FineSec utilizes knowledge distillation to transfer expertise from large teacher models to compact student models, achieving high accuracy with minimal computational cost. By integrating data preparation, training, evaluation, and continuous learning into a unified, single-task workflow, FineSec offers a streamlined approach. Extensive evaluations on C/C++ codebases demonstrate its superiority over both base models and larger LLMs in identifying complex vulnerabilities and logical flaws, establishing FineSec as a practical and scalable solution for real-world software security. To facilitate reproducibility, the datasets, source code, and experimental results are made publicly available at: https://github.com/yangxiaoxuan123/FineSec_detect.
Updated: 2025-10-08 04:58:51
标题: 提炼轻量级语言模型用于C/C++漏洞
摘要: 现代软件系统的不断复杂化加剧了安全漏洞的普遍存在,存在严重侵犯和实质经济损失的风险。因此,强大的代码漏洞检测对于软件安全至关重要。虽然大型语言模型(LLMs)在自然语言处理方面展示出卓越的能力,但它们在自动化代码漏洞检测方面的潜力尚未得到充分开发。本文介绍了FineSec,一个通过知识蒸馏利用LLMs实现对C/C++代码库中漏洞进行高效和精确识别的新框架。FineSec利用知识蒸馏将大型教师模型的专业知识转移给紧凑的学生模型,以最小的计算成本实现高准确率。通过将数据准备、训练、评估和持续学习整合到统一的单一任务工作流程中,FineSec提供了一种简化的方法。对C/C++代码库的广泛评估表明,FineSec在识别复杂漏洞和逻辑缺陷方面优于基本模型和更大的LLMs,确立FineSec作为实际且可扩展的解决方案,用于真实世界的软件安全。为了促进可重现性,数据集、源代码和实验结果已经公开发布在:https://github.com/yangxiaoxuan123/FineSec_detect。
更新时间: 2025-10-08 04:58:51
领域: cs.CR,cs.AI
Membership Inference Attacks on LLM-based Recommender Systems
Large language models (LLMs) based Recommender Systems (RecSys) can flexibly adapt recommendation systems to different domains. It utilizes in-context learning (ICL), i.e., the prompts, to customize the recommendation functions, which include sensitive historical user-specific item interactions, e.g., implicit feedback like clicked items or explicit product reviews. Such private information may be exposed to novel privacy attack. However, no study has been done on this important issue. We design four membership inference attacks (MIAs), aiming to reveal whether victims' historical interactions have been used by system prompts. They are \emph{direct inquiry, hallucination, similarity, and poisoning attacks}, each of which utilizes the unique features of LLMs or RecSys. We have carefully evaluated them on three LLMs that have been used to develop ICL-LLM RecSys and two well-known RecSys benchmark datasets. The results confirm that the MIA threat on LLM RecSys is realistic: direct inquiry and poisoning attacks showing significantly high attack advantages. We have also analyzed the factors affecting these attacks, such as the number of shots in system prompts and the position of the victim in the shots.
Updated: 2025-10-08 04:48:57
标题: 基于LLM的推荐系统的成员推断攻击
摘要: 大型语言模型(LLMs)基于推荐系统(RecSys)可以灵活地适应不同领域的推荐系统。它利用上下文学习(ICL),即提示语,来定制推荐功能,其中包括敏感的历史用户特定项目互动,例如,隐式反馈如点击项目或明确的产品评论。这些私人信息可能会暴露给新型的隐私攻击。然而,对这一重要问题尚未进行研究。我们设计了四种成员推理攻击(MIAs),旨在揭示受害者的历史互动是否被系统提示使用。它们分别是\emph{直接询问、幻觉、相似性和毒化攻击},每种攻击利用了LLMs或RecSys的独特特征。我们已经在三个用于开发ICL-LLM RecSys的LLMs和两个知名的RecSys基准数据集上对它们进行了仔细评估。结果证实了LLM RecSys上的MIA威胁是现实的:直接询问和毒化攻击显示出明显高攻击优势。我们还分析了影响这些攻击的因素,例如系统提示中的拍摄次数和受害者在拍摄中的位置。
更新时间: 2025-10-08 04:48:57
领域: cs.IR,cs.AI,cs.CL,cs.CR,cs.LG
A Comparative Analysis of Contextual Representation Flow in State-Space and Transformer Architectures
State Space Models (SSMs) have recently emerged as efficient alternatives to Transformer-Based Models (TBMs) for long-sequence processing, offering linear scaling and lower memory use. Yet, how contextual information flows across layers and tokens in these architectures remains understudied. We present the first unified, token- and layer-level analysis of representation propagation in SSMs and TBMs. Using centered kernel alignment, stability metrics, and probing, we characterize how representations evolve within and across layers. We find a key divergence: TBMs rapidly homogenize token representations, with diversity reemerging only in later layers, while SSMs preserve token uniqueness early but converge to homogenization deeper. Theoretical analysis and parameter randomization further reveal that oversmoothing in TBMs stems from architectural design, whereas in SSMs it arises mainly from training dynamics. These insights clarify the inductive biases of both architectures and inform future model and training designs for long-context reasoning.
Updated: 2025-10-08 04:46:11
标题: 状态空间和变压器架构中上下文表示流的比较分析
摘要: 状态空间模型(SSMs)最近已经成为处理长序列的高效替代方案,相比于基于Transformer的模型(TBMs),SSMs具有线性扩展和更低的内存使用。然而,这些架构中的层和标记之间的上下文信息流如何仍然鲜为人知。我们提出了对SSMs和TBMs中表示传播的第一个统一的标记和层级分析。通过使用中心核对齐、稳定性度量和探测,我们表征了表示在层内部和层之间如何演变。我们发现一个关键的分歧:TBMs迅速使标记表示同质化,在后续层中才重新出现多样性,而SSMs在早期保持标记的唯一性,但在更深层次上收敛到同质化。理论分析和参数随机化进一步揭示,TBMs中的过度平滑源于架构设计,而在SSMs中主要源于训练动态。这些见解澄清了这两种架构的归纳偏见,并为长上下文推理的未来模型和训练设计提供了信息。
更新时间: 2025-10-08 04:46:11
领域: cs.CL,cs.LG
AutoDAN-Reasoning: Enhancing Strategies Exploration based Jailbreak Attacks with Test-Time Scaling
Recent advancements in jailbreaking large language models (LLMs), such as AutoDAN-Turbo, have demonstrated the power of automated strategy discovery. AutoDAN-Turbo employs a lifelong learning agent to build a rich library of attack strategies from scratch. While highly effective, its test-time generation process involves sampling a strategy and generating a single corresponding attack prompt, which may not fully exploit the potential of the learned strategy library. In this paper, we propose to further improve the attack performance of AutoDAN-Turbo through test-time scaling. We introduce two distinct scaling methods: Best-of-N and Beam Search. The Best-of-N method generates N candidate attack prompts from a sampled strategy and selects the most effective one based on a scorer model. The Beam Search method conducts a more exhaustive search by exploring combinations of strategies from the library to discover more potent and synergistic attack vectors. According to the experiments, the proposed methods significantly boost performance, with Beam Search increasing the attack success rate by up to 15.6 percentage points on Llama-3.1-70B-Instruct and achieving a nearly 60% relative improvement against the highly robust GPT-o4-mini compared to the vanilla method.
Updated: 2025-10-08 04:37:35
标题: AutoDAN-Reasoning: 基于测试时间缩放的增强策略探索的越狱攻击
摘要: 最近对大型语言模型(LLMs)进行越狱的最新进展,如AutoDAN-Turbo,展示了自动化策略发现的强大力量。AutoDAN-Turbo利用终身学习代理构建了一个丰富的攻击策略库。虽然非常有效,但其测试时生成过程涉及对策略进行采样并生成一个对应的攻击提示,可能无法充分利用学习到的策略库的潜力。在本文中,我们提出通过测试时缩放进一步提高AutoDAN-Turbo的攻击性能。我们引入了两种不同的缩放方法:Best-of-N和Beam Search。Best-of-N方法从采样策略生成N个候选攻击提示,并根据评分模型选择最有效的一个。Beam Search方法通过探索来自库中策略的组合进行更详尽的搜索,发现更强大和协同作用的攻击向量。根据实验,所提出的方法显著提升了性能,Beam Search在Llama-3.1-70B-Instruct上将攻击成功率提高了高达15.6个百分点,并与基本方法相比,在与高度鲁棒的GPT-o4-mini对比时取得了近60%的相对改进。
更新时间: 2025-10-08 04:37:35
领域: cs.CR,cs.AI
Control-Augmented Autoregressive Diffusion for Data Assimilation
Despite recent advances in test-time scaling and finetuning of diffusion models, guidance in Auto-Regressive Diffusion Models (ARDMs) remains underexplored. We introduce an amortized framework that augments pretrained ARDMs with a lightweight controller network, trained offline by previewing future ARDM rollouts and learning stepwise controls that anticipate upcoming observations under a terminal cost objective. We evaluate this framework in the context of data assimilation (DA) for chaotic spatiotemporal partial differential equations (PDEs), a setting where existing methods are often computationally prohibitive and prone to forecast drift under sparse observations. Our approach reduces DA inference to a single forward rollout with on-the-fly corrections, avoiding expensive adjoint computations and/or optimizations during inference. We demonstrate that our method consistently outperforms four state-of-the-art baselines in stability, accuracy, and physical fidelity across two canonical PDEs and six observation regimes. We will release code and checkpoints publicly.
Updated: 2025-10-08 04:37:32
标题: 增强控制的自回归扩散在数据同化中的应用
摘要: 尽管最近在扩展测试时间和微调扩散模型方面取得了进展,但自回归扩散模型(ARDMs)的指导仍未得到充分探讨。我们引入了一个摊销框架,通过预览未来ARDM展开并学习逐步控制,以满足终端成本目标,从而增强了预训练的ARDMs。我们在混沌时空部分微分方程(PDEs)的数据同化(DA)环境中评估了这个框架,这是一个现有方法往往在计算上代价高昂且容易在稀疏观测下出现预测漂移的设置。我们的方法将DA推断简化为一次前向展开,实时进行校正,避免了昂贵的伴随计算和/或推断过程中的优化。我们展示了我们的方法在两个经典PDEs和六种观测模式下,在稳定性、准确性和物理保真性方面始终优于四种最先进的基线方法。我们将公开发布代码和检查点。
更新时间: 2025-10-08 04:37:32
领域: cs.LG,cs.AI,cs.CV
StruSR: Structure-Aware Symbolic Regression with Physics-Informed Taylor Guidance
Symbolic regression aims to find interpretable analytical expressions by searching over mathematical formula spaces to capture underlying system behavior, particularly in scientific modeling governed by physical laws. However, traditional methods lack mechanisms for extracting structured physical priors from time series observations, making it difficult to capture symbolic expressions that reflect the system's global behavior. In this work, we propose a structure-aware symbolic regression framework, called StruSR, that leverages trained Physics-Informed Neural Networks (PINNs) to extract locally structured physical priors from time series data. By performing local Taylor expansions on the outputs of the trained PINN, we obtain derivative-based structural information to guide symbolic expression evolution. To assess the importance of expression components, we introduce a masking-based attribution mechanism that quantifies each subtree's contribution to structural alignment and physical residual reduction. These sensitivity scores steer mutation and crossover operations within genetic programming, preserving substructures with high physical or structural significance while selectively modifying less informative components. A hybrid fitness function jointly minimizes physics residuals and Taylor coefficient mismatch, ensuring consistency with both the governing equations and the local analytical behavior encoded by the PINN. Experiments on benchmark PDE systems demonstrate that StruSR improves convergence speed, structural fidelity, and expression interpretability compared to conventional baselines, offering a principled paradigm for physics-grounded symbolic discovery.
Updated: 2025-10-08 04:37:04
标题: StruSR: 结构感知的符号回归与基于物理信息的泰勒引导
摘要: 符号回归旨在通过搜索数学公式空间来捕捉基础系统行为,尤其是在受物理定律控制的科学建模中找到可解释的分析表达式。然而,传统方法缺乏从时间序列观察中提取结构化物理先验的机制,这使得难以捕捉反映系统全局行为的符号表达式。在这项工作中,我们提出了一种称为StruSR的结构感知符号回归框架,利用训练有素的物理信息神经网络(PINNs)从时间序列数据中提取本地结构化的物理先验。通过对训练有素的PINN的输出进行局部泰勒展开,我们获得基于导数的结构信息,以指导符号表达式的演变。为了评估表达式组件的重要性,我们引入了一种基于掩模的归因机制,量化每个子树对结构对齐和物理残差减少的贡献。这些敏感性分数在遗传编程中引导突变和交叉操作,保留具有高物理或结构意义的子结构,同时有选择地修改信息较少的组件。混合适应度函数共同最小化物理残差和泰勒系数不匹配,确保与既定方程和PINN编码的本地解析行为一致。对基准PDE系统的实验表明,与传统基线相比,StruSR提高了收敛速度、结构保真度和表达可解释性,为基于物理的符号发现提供了原则性范式。
更新时间: 2025-10-08 04:37:04
领域: cs.LG,cs.CV
Unveiling the Basin-Like Loss Landscape in Large Language Models
We discover the emergence of \textit{basins} in the loss landscape of large language models. As model scale increases, LLMs become progressively more resilient to random perturbations in the parameter space, giving rise to expansive stability regions where models exhibit nearly identical performance, but outside of which their capabilities collapse. We observe that pre-training creates a \textit{basic capability} basin, and subsequent alignment fine-tuning forms \textit{specific capability} basins (e.g., safety, math, coding). Thus, we argue that benign fine-tuning confined to the basin should preserve prior capabilities. Besides, we also analyze the loss landscape for worst-case directions, which is consistently sharp and detrimental. We find that adversarial fine-tuning moves along the nearly worst-case directions, thus rapidly degrading model capabilities. Finally, we provide a theoretical analysis demonstrating that the basin size bounds the performance degradation of any fine-tuning, including the adversarial ones, while also guaranteeing the model robustness w.r.t. input perturbations, suggesting the benefit of enlarging basins.
Updated: 2025-10-08 04:36:39
标题: 揭示大型语言模型中类似盆地的损失景观
摘要: 我们发现大型语言模型的损失景观中出现了\textit{盆地}的出现。随着模型规模的增加,LLMs对参数空间中的随机扰动变得越来越具有弹性,形成了扩张性稳定区域,在这些区域内模型表现出几乎相同的性能,但在这些区域之外,它们的能力会崩溃。我们观察到预训练创建了一个\textit{基本能力}盆地,而后续的对齐微调形成了\textit{特定能力}盆地(例如,安全性、数学、编码)。因此,我们认为限制在盆地内的良性微调应该保留先前的能力。此外,我们还分析了最坏情况下的损失景观,这种情况一贯是尖锐和有害的。我们发现对抗性微调沿着几乎最坏情况的方向移动,因此迅速降低了模型的能力。最后,我们提供了一项理论分析,证明了盆地大小限制了任何微调(包括对抗性微调)的性能下降,同时也保证了模型对输入扰动的鲁棒性,建议扩大盆地的好处。
更新时间: 2025-10-08 04:36:39
领域: cs.LG
Three Forms of Stochastic Injection for Improved Distribution-to-Distribution Generative Modeling
Modeling transformations between arbitrary data distributions is a fundamental scientific challenge, arising in applications like drug discovery and evolutionary simulation. While flow matching offers a natural framework for this task, its use has thus far primarily focused on the noise-to-data setting, while its application in the general distribution-to-distribution setting is underexplored. We find that in the latter case, where the source is also a data distribution to be learned from limited samples, standard flow matching fails due to sparse supervision. To address this, we propose a simple and computationally efficient method that injects stochasticity into the training process by perturbing source samples and flow interpolants. On five diverse imaging tasks spanning biology, radiology, and astronomy, our method significantly improves generation quality, outperforming existing baselines by an average of 9 FID points. Our approach also reduces the transport cost between input and generated samples to better highlight the true effect of the transformation, making flow matching a more practical tool for simulating the diverse distribution transformations that arise in science.
Updated: 2025-10-08 04:36:34
标题: 三种随机注入形式用于改善分布到分布生成建模
摘要: 建模任意数据分布之间的转换是一个基本的科学挑战,在药物发现和进化模拟等应用中出现。虽然流匹配为这一任务提供了一个自然的框架,但迄今为止,其主要集中在从噪音到数据的设置中,而在一般的分布到分布的设置中的应用尚未得到充分开发。我们发现,在后一种情况下,源也是要从有限样本中学习的数据分布时,由于监督稀疏,标准的流匹配失败了。为了解决这个问题,我们提出了一种简单且计算效率高的方法,通过扰动源样本和流插值器,在训练过程中引入随机性。在涵盖生物学、放射学和天文学的五个不同的成像任务中,我们的方法显著改善了生成质量,平均比现有基线高出9个FID点。我们的方法还减少了输入和生成样本之间的传输成本,以更好地突出转换的真正效果,使流匹配成为模拟科学中出现的多样分布转换的更实用工具。
更新时间: 2025-10-08 04:36:34
领域: cs.LG
Obfuscated Quantum and Post-Quantum Cryptography
In this work, we present an experimental deployment of a new design for combined quantum key distribution (QKD) and post-quantum cryptography (PQC). Novel to our system is the dynamic obfuscation of the QKD-PQC sequence of operations, the number of operations, and parameters related to the operations; coupled to the integration of a GPS-free quantum synchronization protocol within the QKD process. We compare the performance and overhead of our QKD-PQC system relative to a standard QKD system with one-time pad encryption, demonstrating that our design can operate in real time with little additional overhead caused by the new security features. Since our system can offer additional defensive strategies against a wide spectrum of practical attacks that undermine deployed QKD, PQC, and certain combinations of these two primitives, we suggest that our design represents one of the most secure communication systems currently available. Given the dynamic nature of its obfuscation attributes, our new system can also be adapted in the field to defeat yet-to-be-discovered practical attacks.
Updated: 2025-10-08 04:34:03
标题: 混淆的量子和后量子密码学
摘要: 在这项工作中,我们介绍了一种新的设计,用于结合量子密钥分发(QKD)和后量子密码学(PQC)。我们系统的新颖之处在于动态模糊QKD-PQC操作序列、操作次数以及与操作相关的参数;同时将不需要GPS的量子同步协议整合到QKD过程中。我们比较了我们的QKD-PQC系统与使用一次性密码加密的标准QKD系统的性能和开销,证明我们的设计可以实时运行,新的安全功能几乎没有额外的开销。由于我们的系统可以提供额外的防御策略来抵御一系列破坏已部署的QKD、PQC以及这两种原语组合的实际攻击,我们认为我们的设计代表了目前最安全的通信系统之一。考虑到其模糊属性的动态特性,我们的新系统也可以在现场适应尚未发现的实际攻击。
更新时间: 2025-10-08 04:34:03
领域: quant-ph,cs.CR
Chem-NMF: Multi-layer $α$-divergence Non-Negative Matrix Factorization for Cardiorespiratory Disease Clustering, with Improved Convergence Inspired by Chemical Catalysts and Rigorous Asymptotic Analysis
Non-Negative Matrix Factorization (NMF) is an unsupervised learning method offering low-rank representations across various domains such as audio processing, biomedical signal analysis, and image recognition. The incorporation of $\alpha$-divergence in NMF formulations enhances flexibility in optimization, yet extending these methods to multi-layer architectures presents challenges in ensuring convergence. To address this, we introduce a novel approach inspired by the Boltzmann probability of the energy barriers in chemical reactions to theoretically perform convergence analysis. We introduce a novel method, called Chem-NMF, with a bounding factor which stabilizes convergence. To our knowledge, this is the first study to apply a physical chemistry perspective to rigorously analyze the convergence behaviour of the NMF algorithm. We start from mathematically proven asymptotic convergence results and then show how they apply to real data. Experimental results demonstrate that the proposed algorithm improves clustering accuracy by 5.6% $\pm$ 2.7% on biomedical signals and 11.1% $\pm$ 7.2% on face images (mean $\pm$ std).
Updated: 2025-10-08 04:31:10
标题: Chem-NMF:基于化学催化剂和严格渐近分析的心呼疾病聚类的多层$α$-散度非负矩阵分解及改进的收敛性
摘要: 非负矩阵分解(NMF)是一种无监督学习方法,在各个领域如音频处理、生物医学信号分析和图像识别中提供低秩表示。在NMF公式中引入$\alpha$-散度增强了优化的灵活性,但将这些方法扩展到多层架构在确保收敛方面存在挑战。为了解决这个问题,我们引入了一种受化学反应中能量障碍玻尔兹曼概率启发的新方法,理论上进行了收敛分析。我们引入了一种名为Chem-NMF的新方法,其中包含一个边界因子,稳定了收敛性。据我们所知,这是第一项应用物理化学视角严格分析NMF算法收敛行为的研究。我们从数学上证明的渐近收敛结果开始,然后展示它们如何应用于真实数据。实验结果表明,所提出的算法将生物医学信号的聚类准确性提高了5.6% ± 2.7%,人脸图像提高了11.1% ± 7.2%(均值 ± 标准偏差)。
更新时间: 2025-10-08 04:31:10
领域: cs.LG,eess.SP
AI-Driven Forecasting and Monitoring of Urban Water System
Underground water and wastewater pipelines are vital for city operations but plagued by anomalies like leaks and infiltrations, causing substantial water loss, environmental damage, and high repair costs. Conventional manual inspections lack efficiency, while dense sensor deployments are prohibitively expensive. In recent years, artificial intelligence has advanced rapidly and is increasingly applied to urban infrastructure. In this research, we propose an integrated AI and remote-sensor framework to address the challenge of leak detection in underground water pipelines, through deploying a sparse set of remote sensors to capture real-time flow and depth data, paired with HydroNet - a dedicated model utilizing pipeline attributes (e.g., material, diameter, slope) in a directed graph for higher-precision modeling. Evaluations on a real-world campus wastewater network dataset demonstrate that our system collects effective spatio-temporal hydraulic data, enabling HydroNet to outperform advanced baselines. This integration of edge-aware message passing with hydraulic simulations enables accurate network-wide predictions from limited sensor deployments. We envision that this approach can be effectively extended to a wide range of underground water pipeline networks.
Updated: 2025-10-08 04:28:38
标题: 人工智能驱动的城市水系统预测和监测
摘要: 地下水和废水管道对城市运行至关重要,但常常受到漏水和渗透等异常问题的困扰,导致大量水资源浪费、环境破坏和高昂的维修成本。传统的人工检查缺乏效率,而密集的传感器部署成本高昂。近年来,人工智能迅速发展并越来越多地应用于城市基础设施。在这项研究中,我们提出了一个集成的人工智能和远程传感器框架,以解决地下水管道漏水检测的挑战,通过部署一组稀疏的远程传感器来捕获实时流量和深度数据,配合使用HydroNet - 一个专门利用管道属性(例如材料、直径、坡度)在有向图中进行更高精度建模的模型。对一个真实的校园废水网络数据集的评估表明,我们的系统收集了有效的时空水力数据,使HydroNet能够胜过先进的基线模型。这种边缘感知信息传递与水力模拟的集成使得从有限的传感器部署中得出准确的网络范围预测成为可能。我们预见这种方法可以有效地扩展到各种地下水管道网络。
更新时间: 2025-10-08 04:28:38
领域: cs.LG,cs.AI
HoPE: Hybrid of Position Embedding for Long Context Vision-Language Models
Vision-Language Models (VLMs) have made significant progress in multimodal tasks. However, their performance often deteriorates in long-context scenarios, particularly long videos. While Rotary Position Embedding (RoPE) has been widely adopted for length generalization in Large Language Models (LLMs), extending vanilla RoPE to capture the intricate spatial-temporal dependencies in videos remains an unsolved challenge. Existing methods typically allocate different frequencies within RoPE to encode 3D positional information. However, these allocation strategies mainly rely on heuristics, lacking in-depth theoretical analysis. In this paper, we first study how different allocation strategies impact the long-context capabilities of VLMs. Our analysis reveals that current multimodal RoPEs fail to reliably capture semantic similarities over extended contexts. To address this issue, we propose HoPE, a Hybrid of Position Embedding designed to improve the long-context capabilities of VLMs. HoPE introduces a hybrid frequency allocation strategy for reliable semantic modeling over arbitrarily long contexts, and a dynamic temporal scaling mechanism to facilitate robust learning and flexible inference across diverse context lengths. Extensive experiments across four video benchmarks on long video understanding and retrieval tasks demonstrate that HoPE consistently outperforms existing methods, confirming its effectiveness. Our code is available at https://github.com/hrlics/HoPE.
Updated: 2025-10-08 04:28:29
标题: HoPE: 长上下文视觉语言模型的位置嵌入混合
摘要: 视觉语言模型(VLMs)在多模态任务中取得了显著进展。然而,在长文本情景下,特别是长视频中,它们的性能通常会下降。虽然旋转位置嵌入(RoPE)已被广泛应用于大型语言模型(LLMs)中的长度泛化,但将基本RoPE扩展以捕捉视频中复杂的时空依赖关系仍然是一个未解决的挑战。现有方法通常在RoPE中分配不同的频率来编码3D位置信息。然而,这些分配策略主要依赖于启发式方法,缺乏深入的理论分析。在本文中,我们首先研究不同的分配策略如何影响VLMs的长文本能力。我们的分析揭示了当前的多模态RoPE无法可靠地捕捉在扩展上下文中的语义相似性。为了解决这个问题,我们提出了HoPE,一种用于改善VLMs长文本能力的混合位置嵌入。HoPE引入了一种混合频率分配策略,可在任意长的情景下可靠地进行语义建模,并引入了动态时间缩放机制,以促进跨不同情景长度的鲁棒学习和灵活推理。在四个视频基准测试中进行的广泛实验表明,HoPE始终优于现有方法,验证了其有效性。我们的代码可以在https://github.com/hrlics/HoPE上找到。
更新时间: 2025-10-08 04:28:29
领域: cs.LG,cs.CV
Unsupervised Backdoor Detection and Mitigation for Spiking Neural Networks
Spiking Neural Networks (SNNs) have gained increasing attention for their superior energy efficiency compared to Artificial Neural Networks (ANNs). However, their security aspects, particularly under backdoor attacks, have received limited attention. Existing defense methods developed for ANNs perform poorly or can be easily bypassed in SNNs due to their event-driven and temporal dependencies. This paper identifies the key blockers that hinder traditional backdoor defenses in SNNs and proposes an unsupervised post-training detection framework, Temporal Membrane Potential Backdoor Detection (TMPBD), to overcome these challenges. TMPBD leverages the maximum margin statistics of temporal membrane potential (TMP) in the final spiking layer to detect target labels without any attack knowledge or data access. We further introduce a robust mitigation mechanism, Neural Dendrites Suppression Backdoor Mitigation (NDSBM), which clamps dendritic connections between early convolutional layers to suppress malicious neurons while preserving benign behaviors, guided by TMP extracted from a small, clean, unlabeled dataset. Extensive experiments on multiple neuromorphic benchmarks and state-of-the-art input-aware dynamic trigger attacks demonstrate that TMPBD achieves 100% detection accuracy, while NDSBM reduces the attack success rate from 100% to 8.44%, and to 2.81% when combined with detection, without degrading clean accuracy.
Updated: 2025-10-08 04:25:35
标题: 无监督的脑后门检测和减轻措施针对脉冲神经网络
摘要: 脉冲神经网络(SNNs)由于其与人工神经网络(ANNs)相比卓越的能源效率而受到越来越多的关注。然而,它们的安全方面,特别是在背门攻击下,受到了有限的关注。为ANNs开发的现有防御方法在SNNs中表现不佳,或者由于它们的事件驱动和时间依赖性而很容易被绕过。本文确定了阻碍SNNs中传统后门防御的关键障碍,并提出了一种无监督的后训练检测框架,即Temporal Membrane Potential Backdoor Detection(TMPBD),以克服这些挑战。TMPBD利用最终脉冲层中的时间膜电位(TMP)的最大边界统计来检测目标标签,无需任何攻击知识或数据访问。我们进一步引入了一种强大的缓解机制,即神经树突抑制后门缓解(NDSBM),它在早期卷积层之间夹紧树突连接以抑制恶意神经元,同时保留良性行为,由从小型、干净、无标签的数据集中提取的TMP指导。对多个神经形态基准和最先进的输入感知动态触发攻击进行了广泛实验,结果显示TMPBD实现了100%的检测准确率,而NDSBM将攻击成功率从100%降低到8.44%,当与检测结合时降至2.81%,而不降低干净准确率。
更新时间: 2025-10-08 04:25:35
领域: cs.CR,cs.CV,cs.LG
Slow-Fast Policy Optimization: Reposition-Before-Update for LLM Reasoning
Reinforcement learning (RL) has become central to enhancing reasoning in large language models (LLMs). Yet on-policy algorithms such as Group Relative Policy Optimization (GRPO) often suffer in early training: noisy gradients from low-quality rollouts lead to unstable updates and inefficient exploration. We introduce Slow-Fast Policy Optimization (SFPO), a simple yet efficient framework to address these limitations via decomposing each step into three stages: a short fast trajectory of inner steps on the same batch, a reposition mechanism to control off-policy drift, and a final slow correction. This reposition-before-update design preserves the objective and rollout process unchanged, making SFPO plug-compatible with existing policy-gradient pipelines. Extensive experiments demonstrate that SFPO consistently improves stability, reduces rollouts, and accelerates convergence of reasoning RL training. Specifically, it outperforms GRPO by up to 2.80 points in average on math reasoning benchmarks. It also achieves up to 4.93\texttimes{} fewer rollouts and an up to 4.19\texttimes{} reduction in wall-clock time to match GRPO's best accuracy.
Updated: 2025-10-08 04:24:36
标题: 慢-快策略优化:用于LLM推理的重新定位-更新
摘要: 强化学习(RL)已成为提升大型语言模型(LLMs)推理能力的核心。然而,像Group Relative Policy Optimization(GRPO)这样的在线算法在早期训练中经常遇到困难:来自低质量回合的嘈杂梯度导致不稳定的更新和低效的探索。我们引入了Slow-Fast Policy Optimization(SFPO),这是一个简单而高效的框架,通过将每一步分解为三个阶段来解决这些限制:在同一批次上进行短暂快速轨迹的内步骤,一个重新定位机制来控制离线策略漂移,以及一个最终缓慢的修正。这种在更新前重新定位的设计保持了目标和回合过程不变,使得SFPO能够与现有的策略梯度流水线兼容。大量实验证明,SFPO不断提高稳定性,减少回合次数,并加速推理RL培训的收敛。具体地,在数学推理基准测试中,它的平均表现优于GRPO高达2.80点。它还实现了高达4.93倍的减少回合次数和高达4.19倍的减少墙钟时间,以匹配GRPO的最佳准确率。
更新时间: 2025-10-08 04:24:36
领域: cs.LG,cs.AI,cs.CL,stat.ML
Guiding Mixture-of-Experts with Temporal Multimodal Interactions
Mixture-of-Experts (MoE) architectures have become pivotal for large-scale multimodal models. However, their routing mechanisms typically overlook the informative, time-varying interaction dynamics between modalities. This limitation hinders expert specialization, as the model cannot explicitly leverage intrinsic modality relationships for effective reasoning. To address this, we propose a novel framework that guides MoE routing using quantified temporal interaction. A multimodal interaction-aware router learns to dispatch tokens to experts based on the nature of their interactions. This dynamic routing encourages experts to acquire generalizable interaction-processing skills rather than merely learning task-specific features. Our framework builds on a new formulation of temporal multimodal interaction dynamics, which are used to guide expert routing. We first demonstrate that these temporal multimodal interactions reveal meaningful patterns across applications, and then show how they can be leveraged to improve both the design and performance of MoE-based models. Comprehensive experiments on challenging multimodal benchmarks validate our approach, demonstrating both enhanced performance and improved interpretability.
Updated: 2025-10-08 04:21:03
标题: 指导混合专家与时间多模态交互
摘要: 专家混合(MoE)架构已成为大规模多模态模型的关键。然而,它们的路由机制通常忽视了模态之间具有信息量且时变的交互动力学。这种限制阻碍了专家的专业化,因为模型无法明确利用内在的模态关系进行有效推理。为了解决这个问题,我们提出了一个新颖的框架,通过量化时间交互来指导MoE路由。一个多模态交互感知路由器学习根据它们之间的交互性质将令牌分配给专家。这种动态路由鼓励专家获得通用的交互处理技能,而不仅仅是学习任务特定的特征。我们的框架建立在一种新的时间多模态交互动力学的表达基础上,用于指导专家路由。我们首先证明这些时间多模态交互在应用程序中展示出有意义的模式,然后展示它们如何被利用来改进基于MoE的模型的设计和性能。在具有挑战性的多模态基准测试上的全面实验验证了我们的方法,展示了增强的性能和改进的可解释性。
更新时间: 2025-10-08 04:21:03
领域: cs.LG
Towards the Worst-case Robustness of Large Language Models
Recent studies have revealed the vulnerability of large language models to adversarial attacks, where adversaries craft specific input sequences to induce harmful, violent, private, or incorrect outputs. In this work, we study their worst-case robustness, i.e., whether an adversarial example exists that leads to such undesirable outputs. We upper bound the worst-case robustness using stronger white-box attacks, indicating that most current deterministic defenses achieve nearly 0\% worst-case robustness. We propose a general tight lower bound for randomized smoothing using fractional knapsack solvers or 0-1 knapsack solvers, and using them to bound the worst-case robustness of all stochastic defenses. Based on these solvers, we provide theoretical lower bounds for several previous empirical defenses. For example, we certify the robustness of a specific case, smoothing using a uniform kernel, against \textit{any possible attack} with an average $\ell_0$ perturbation of 2.02 or an average suffix length of 6.41.
Updated: 2025-10-08 04:21:02
标题: 朝向大型语言模型最坏情况下的稳健性
摘要: 最近的研究揭示了大型语言模型对对抗攻击的脆弱性,其中对手制定特定的输入序列以诱导有害、暴力、私密或不正确的输出。在这项工作中,我们研究它们的最坏情况鲁棒性,即是否存在一个对抗性示例导致这种不良输出。我们使用更强大的白盒攻击来上限最坏情况的鲁棒性,表明大多数当前的确定性防御几乎实现了0\%的最坏情况鲁棒性。我们提出了使用分数背包求解器或0-1背包求解器的随机平滑的一般紧密下界,并使用它们来限制所有随机防御的最坏情况鲁棒性。基于这些求解器,我们为几种先前的经验防御提供了理论下界。例如,我们证明了一种特定情况下的鲁棒性,即使用均匀核进行平滑,对\textit{任何可能的攻击}具有平均$\ell_0$扰动为2.02或平均后缀长度为6.41。
更新时间: 2025-10-08 04:21:02
领域: cs.LG
Simulation-based inference via telescoping ratio estimation for trawl processes
The growing availability of large and complex datasets has increased interest in temporal stochastic processes that can capture stylized facts such as marginal skewness, non-Gaussian tails, long memory, and even non-Markovian dynamics. While such models are often easy to simulate from, parameter estimation remains challenging. Simulation-based inference (SBI) offers a promising way forward, but existing methods typically require large training datasets or complex architectures and frequently yield confidence (credible) regions that fail to attain their nominal values, raising doubts on the reliability of estimates for the very features that motivate the use of these models. To address these challenges, we propose a fast and accurate, sample-efficient SBI framework for amortized posterior inference applicable to intractable stochastic processes. The proposed approach relies on two main steps: first, we learn the posterior density by decomposing it sequentially across parameter dimensions. Then, we use Chebyshev polynomial approximations to efficiently generate independent posterior samples, enabling accurate inference even when Markov chain Monte Carlo methods mix poorly. We further develop novel diagnostic tools for SBI in this context, as well as post-hoc calibration techniques; the latter not only lead to performance improvements of the learned inferential tool, but also to the ability to reuse it directly with new time series of varying lengths, thus amortizing the training cost. We demonstrate the method's effectiveness on trawl processes, a class of flexible infinitely divisible models that generalize univariate Gaussian processes, applied to energy demand data.
Updated: 2025-10-08 04:20:39
标题: 通过望远镜比率估计进行拖网过程的基于仿真的推断
摘要: 大型和复杂数据集的不断增加增加了人们对能够捕捉如边际偏斜、非高斯尾部、长期记忆甚至非马尔科夫动态等风格化事实的时间随机过程的兴趣。虽然这些模型通常易于模拟,但参数估计仍然具有挑战性。基于模拟的推断(SBI)提供了一种有前途的前进方式,但现有方法通常需要大量训练数据集或复杂的结构,并且经常产生未能达到名义值的置信(可信)区域,对激发这些模型使用的特征的估计的可靠性产生疑问。为解决这些挑战,我们提出了一种快速准确、样本高效的SBI框架,适用于不可解的随机过程的摊销后验推断。所提出的方法依赖于两个主要步骤:首先,我们通过在参数维度上依次分解后验密度来学习后验密度。然后,我们使用切比雪夫多项式逼近来高效生成独立的后验样本,即使马尔可夫链蒙特卡洛方法混合性差,也能实现准确推断。我们在这种情况下进一步开发了SBI的新诊断工具,以及事后校准技术;后者不仅提高了学习推断工具的性能,还使其能够直接在新的时间序列上重复使用,从而摊销训练成本。我们在拖网过程上展示了该方法在能源需求数据上的有效性,拖网过程是一类灵活的无限可分模型,推广了一元高斯过程。
更新时间: 2025-10-08 04:20:39
领域: stat.ML,cs.LG,stat.ME
POME: Post Optimization Model Edit via Muon-style Projection
We introduce Post-Optimization Model Edit (POME), a new algorithm that enhances the performance of fine-tuned large language models using only their pretrained and fine-tuned checkpoints, without requiring extra data or further optimization. The core idea is to apply a muon-style projection to $\Delta W$, the difference between the fine-tuned and pretrained weights. This projection uses truncated singular value decomposition (SVD) to equalize the influence of dominant update directions and prune small singular values, which often represent noise. As a simple post-processing step, POME is completely decoupled from the training pipeline. It requires zero modifications and imposes no overhead, making it universally compatible with any optimizer or distributed framework. POME delivers consistent gains, boosting average performance by +2.5\% on GSM8K and +1.0\% on code generation. Its broad applicability -- from 7B foundation models to 72B RLHF-instructed models -- establishes it as a practical, zero-cost enhancement for any fine-tuning pipeline. Code is available at https://github.com/NUS-HPC-AI-Lab/POME.
Updated: 2025-10-08 04:20:11
标题: POME: 通过Muon风格投影进行后优化模型编辑
摘要: 我们介绍了Post-Optimization Model Edit(POME),这是一种新算法,通过仅使用预训练和微调的检查点,而无需额外数据或进一步优化,来提高精细调整的大型语言模型的性能。其核心思想是对$\Delta W$进行μ子样式投影,即微调和预训练权重之间的差异。该投影使用截断奇异值分解(SVD)来均衡主导更新方向的影响,并修剪小奇异值,这些奇异值通常代表噪音。作为一个简单的后处理步骤,POME完全脱离训练流程。它不需要任何修改,也不会增加额外开销,因此可以与任何优化器或分布式框架兼容。POME提供一致的增益,将平均性能提升了+2.5\%的GSM8K和+1.0\%的代码生成。其广泛适用性--从7B基础模型到72B RLHF指导模型--将其确立为任何微调流程的实用、零成本增强。代码可在https://github.com/NUS-HPC-AI-Lab/POME找到。
更新时间: 2025-10-08 04:20:11
领域: cs.LG
DPGIIL: Dirichlet Process-Deep Generative Model-Integrated Incremental Learning for Clustering in Transmissibility-based Online Structural Anomaly Detection
Clustering based on vibration responses, such as transmissibility functions (TFs), is promising in structural anomaly detection. However, most existing methods struggle to determine the optimal cluster number, handle high-dimensional streaming data, and rely heavily on manually engineered features due to their shallow structures. To address these issues, this work proposes a novel clustering framework, referred to as Dirichlet process-deep generative model-integrated incremental learning (DPGIIL), for online structural anomaly detection, which combines the advantages of deep generative models (DGMs) in representation learning and the Dirichlet process mixture model (DPMM) in identifying distinct patterns in observed data. Within the context of variational Bayesian inference, a lower bound on the log marginal likelihood of DPGIIL, tighter than the evidence lower bound, is derived analytically, which enables the joint optimization of DGM and DPMM parameters, thereby allowing the DPMM to regularize the DGM's feature extraction process. Additionally, a greedy split-merge scheme-based coordinate ascent variational inference method is devised to accelerate the optimization. The summary statistics of the DPMM, along with the network parameters, are used to retain information about previous data for incremental learning. For online structural anomaly detection, DPGIIL can not only detect anomalies by dynamically assigning incoming data to new clusters but also indicate different structural states using distinct clusters, thereby providing additional information about the operating conditions of the monitored structure compared to traditional anomaly detectors. Three case studies demonstrate the dynamic adaptability of the proposed method and show that it outperforms some state-of-the-art approaches in both structural anomaly detection and clustering.
Updated: 2025-10-08 04:14:42
标题: DPGIIL:基于传播的在线结构异常检测中的簇集成增量学习的狄利克雷过程-深度生成模型
摘要: 基于振动响应(如传递函数TFs)的聚类在结构异常检测方面具有很大潜力。然而,大多数现有方法在确定最佳聚类数、处理高维度流数据以及依赖手动设计特征方面存在困难,这是由于它们的浅层结构。为了解决这些问题,本研究提出了一种新的聚类框架,称为Dirichlet过程-深度生成模型-集成增量学习(DPGIIL),用于在线结构异常检测,结合了深度生成模型(DGMs)在表示学习方面的优势以及Dirichlet过程混合模型(DPMM)在识别观察数据中不同模式的优势。在变分贝叶斯推断的背景下,对DPGIIL的log边际似然度进行了解析的下界,比证据下界更紧,从而使DGM和DPMM参数的联合优化成为可能,从而使DPMM能够规范DGM的特征提取过程。此外,设计了基于贪婪分裂-并并方案的坐标上升变分推断方法,以加速优化。DPMM的摘要统计数据,以及网络参数,用于保留先前数据的信息以进行增量学习。对于在线结构异常检测,DPGIIL不仅可以通过动态将传入数据分配给新簇来检测异常,还可以使用不同簇指示不同的结构状态,从而提供比传统异常检测器更多有关受监控结构的运行状况的信息。三个案例研究展示了所提出方法的动态适应性,并表明在结构异常检测和聚类方面优于一些最新方法。
更新时间: 2025-10-08 04:14:42
领域: cs.LG,physics.data-an,stat.ML
DPA-Net: A Dual-Path Attention Neural Network for Inferring Glycemic Control Metrics from Self-Monitored Blood Glucose Data
Continuous glucose monitoring (CGM) provides dense and dynamic glucose profiles that enable reliable estimation of Ambulatory Glucose Profile (AGP) metrics, such as Time in Range (TIR), Time Below Range (TBR), and Time Above Range (TAR). However, the high cost and limited accessibility of CGM restrict its widespread adoption, particularly in low- and middle-income regions. In contrast, self-monitoring of blood glucose (SMBG) is inexpensive and widely available but yields sparse and irregular data that are challenging to translate into clinically meaningful glycemic metrics. In this work, we propose a Dual-Path Attention Neural Network (DPA-Net) to estimate AGP metrics directly from SMBG data. DPA-Net integrates two complementary paths: (1) a spatial-channel attention path that reconstructs a CGM-like trajectory from sparse SMBG observations, and (2) a multi-scale ResNet path that directly predicts AGP metrics. An alignment mechanism between the two paths is introduced to reduce bias and mitigate overfitting. In addition, we develop an active point selector to identify realistic and informative SMBG sampling points that reflect patient behavioral patterns. Experimental results on a large, real-world dataset demonstrate that DPA-Net achieves robust accuracy with low errors while reducing systematic bias. To the best of our knowledge, this is the first supervised machine learning framework for estimating AGP metrics from SMBG data, offering a practical and clinically relevant decision-support tool in settings where CGM is not accessible.
Updated: 2025-10-08 04:06:22
标题: DPA-Net:一种用于从自我监测的血糖数据推断血糖控制指标的双路径注意力神经网络
摘要: 连续血糖监测(CGM)提供了密集和动态的血糖数据,使得可以可靠地估计出行血糖概况(AGP)指标,如在范围内的时间(TIR)、低于范围的时间(TBR)和高于范围的时间(TAR)。然而,CGM的高成本和有限的可及性限制了其在低收入和中等收入地区的广泛采用。相比之下,血糖自我监测(SMBG)成本低廉且易获得,但产生的数据稀疏且不规则,难以转化为临床意义的血糖指标。 在这项工作中,我们提出了一种双路径注意力神经网络(DPA-Net),可以直接从SMBG数据估计AGP指标。DPA-Net整合了两个互补的路径:(1)一个空间-通道注意力路径,从稀疏的SMBG观测中重构出类似CGM的轨迹,以及(2)一个多尺度ResNet路径,直接预测AGP指标。引入了两个路径之间的对准机制,以减少偏差和减轻过拟合。此外,我们开发了一个主动点选择器,用于识别反映患者行为模式的现实和信息丰富的SMBG采样点。 在一个大型实际数据集上的实验结果表明,DPA-Net在降低系统性偏差的同时,实现了良好的准确性和低误差。据我们所知,这是第一个用于从SMBG数据估计AGP指标的监督式机器学习框架,为在无法获取CGM的情况下提供了一种实用且临床相关的决策支持工具。
更新时间: 2025-10-08 04:06:22
领域: cs.LG
A Copula Based Supervised Filter for Feature Selection in Diabetes Risk Prediction Using Machine Learning
Effective feature selection is vital for robust and interpretable medical prediction, especially for identifying risk factors concentrated in extreme patient strata. Standard methods emphasize average associations and may miss predictors whose importance lies in the tails of the distribution. We propose a computationally efficient supervised filter that ranks features using the Gumbel copula upper tail dependence coefficient ($\lambda_U$), prioritizing variables that are simultaneously extreme with the positive class. We benchmarked against Mutual Information, mRMR, ReliefF, and $L_1$ Elastic Net across four classifiers on two diabetes datasets: a large public health survey (CDC, N=253,680) and a clinical benchmark (PIMA, N=768). Evaluation included paired statistical tests, permutation importance, and robustness checks with label flips, feature noise, and missingness. On CDC, our method was the fastest selector and reduced the feature space by about 52% while retaining strong discrimination. Although using all 21 features yielded the highest AUC, our filter significantly outperformed Mutual Information and mRMR and was statistically indistinguishable from ReliefF. On PIMA, with only eight predictors, our ranking produced the numerically highest ROC AUC, and no significant differences were found versus strong baselines. Across both datasets, the upper tail criterion consistently identified clinically coherent, impactful predictors. We conclude that copula based feature selection via upper tail dependence is a powerful, efficient, and interpretable approach for building risk models in public health and clinical medicine.
Updated: 2025-10-08 04:03:38
标题: 一种基于copula的监督滤波器在使用机器学习进行糖尿病风险预测中的特征选择
摘要: 有效的特征选择对于强健且可解释的医学预测至关重要,特别是用于识别集中在极端患者层中的风险因素。标准方法强调平均关联,并可能忽略在分布尾部重要性较高的预测变量。我们提出了一种计算效率高的监督过滤器,使用Gumbel copula上尾依赖系数($\lambda_U$)对特征进行排名,优先考虑与正类同时极端的变量。我们对两个糖尿病数据集(一个大型公共卫生调查(CDC,N=253,680)和一个临床基准(PIMA,N=768))在四个分类器上进行了基准测试,包括相互信息、mRMR、ReliefF和$L_1$弹性网。评估包括配对统计检验,置换重要性,以及标签翻转、特征噪声和缺失性的鲁棒性检查。在CDC上,我们的方法是速度最快的选择器,将特征空间缩减约52%,同时保持了强大的区分能力。尽管使用全部21个特征产生了最高的AUC,我们的过滤器明显优于相互信息和mRMR,并在统计上与ReliefF无法区分。在PIMA上,仅有八个预测变量,我们的排名产生了数值最高的ROC AUC,并与强基线没有显著差异。在两个数据集中,上尾准则始终确定了临床连贯、有影响力的预测因子。我们得出结论,基于copula的特征选择通过上尾依赖是在公共卫生和临床医学中构建风险模型的一种强大、高效和可解释的方法。
更新时间: 2025-10-08 04:03:38
领域: stat.ML,cs.LG
Reinforcement Learning for Dynamic Memory Allocation
In recent years, reinforcement learning (RL) has gained popularity and has been applied to a wide range of tasks. One such popular domain where RL has been effective is resource management problems in systems. We look to extend work on RL for resource management problems by considering the novel domain of dynamic memory allocation management. We consider dynamic memory allocation to be a suitable domain for RL since current algorithms like first-fit, best-fit, and worst-fit can fail to adapt to changing conditions and can lead to fragmentation and suboptimal efficiency. In this paper, we present a framework in which an RL agent continuously learns from interactions with the system to improve memory management tactics. We evaluate our approach through various experiments using high-level and low-level action spaces and examine different memory allocation patterns. Our results show that RL can successfully train agents that can match and surpass traditional allocation strategies, particularly in environments characterized by adversarial request patterns. We also explore the potential of history-aware policies that leverage previous allocation requests to enhance the allocator's ability to handle complex request patterns. Overall, we find that RL offers a promising avenue for developing more adaptive and efficient memory allocation strategies, potentially overcoming limitations of hardcoded allocation algorithms.
Updated: 2025-10-08 04:03:22
标题: 强化学习在动态内存分配中的应用
摘要: 最近几年,强化学习(RL)越来越受欢迎,并被应用于各种任务中。其中一个RL表现良好的领域是系统中的资源管理问题。我们希望通过考虑动态内存分配管理这一新领域来扩展RL在资源管理问题上的工作。我们认为动态内存分配是RL的适合领域,因为当前的算法如首次适应、最佳适应和最坏适应可能无法适应不断变化的条件,导致碎片化和效率低下。在本文中,我们提出了一个框架,其中一个RL代理不断从与系统的交互中学习,以改进内存管理策略。我们通过使用高级和低级动作空间进行各种实验来评估我们的方法,并研究不同的内存分配模式。我们的结果表明,RL可以成功训练出能够匹配和超越传统分配策略的代理,特别是在具有对抗性请求模式的环境中。我们还探讨了利用先前分配请求来增强分配器处理复杂请求模式能力的基于历史的策略的潜力。总体而言,我们发现RL为开发更具适应性和高效性的内存分配策略提供了一个有前途的途径,可能克服硬编码分配算法的局限性。
更新时间: 2025-10-08 04:03:22
领域: cs.LG,cs.OS
FEAorta: A Fully Automated Framework for Finite Element Analysis of the Aorta From 3D CT Images
Aortic aneurysm disease ranks consistently in the top 20 causes of death in the U.S. population. Thoracic aortic aneurysm is manifested as an abnormal bulging of thoracic aortic wall and it is a leading cause of death in adults. From the perspective of biomechanics, rupture occurs when the stress acting on the aortic wall exceeds the wall strength. Wall stress distribution can be obtained by computational biomechanical analyses, especially structural Finite Element Analysis. For risk assessment, probabilistic rupture risk of TAA can be calculated by comparing stress with material strength using a material failure model. Although these engineering tools are currently available for TAA rupture risk assessment on patient specific level, clinical adoption has been limited due to two major barriers: labor intensive 3D reconstruction current patient specific anatomical modeling still relies on manual segmentation, making it time consuming and difficult to scale to a large patient population, and computational burden traditional FEA simulations are resource intensive and incompatible with time sensitive clinical workflows. The second barrier was successfully overcome by our team through the development of the PyTorch FEA library and the FEA DNN integration framework. By incorporating the FEA functionalities within PyTorch FEA and applying the principle of static determinacy, we reduced the FEA based stress computation time to approximately three minutes per case. Moreover, by integrating DNN and FEA through the PyTorch FEA library, our approach further decreases the computation time to only a few seconds per case. This work focuses on overcoming the first barrier through the development of an end to end deep neural network capable of generating patient specific finite element meshes of the aorta directly from 3D CT images.
Updated: 2025-10-08 04:00:46
标题: FEAorta:一种用于基于3D CT图像对主动脉进行有限元分析的全自动化框架
摘要: 主动脉瘤病在美国人口中一直稳居前20位死因之列。胸主动脉瘤表现为胸主动脉壁异常膨胀,是成年人死亡的主要原因之一。从生物力学的角度来看,当作用于主动脉壁的应力超过壁强度时,会发生破裂。通过计算生物力学分析,特别是结构有限元分析,可以得到壁应力分布。为了风险评估,可以通过比较应力与材料强度,使用材料破坏模型来计算TAA的概率性破裂风险。尽管这些工程工具目前已经可以用于TAA破裂风险评估的患者特定水平,但由于两个主要障碍,临床应用受到限制:劳动密集型的3D重建,目前患者特定解剖建模仍依赖手动分割,使得耗时且难以扩展到大规模患者群体,以及计算负担传统有限元分析模拟资源密集,不适用于时间敏感的临床工作流程。第二个障碍已经成功地被我们的团队克服,通过开发PyTorch FEA库和FEA DNN集成框架。通过在PyTorch FEA内部集成FEA功能,并应用静态确定原理,我们将基于FEA的应力计算时间减少到每个案例约三分钟。此外,通过将DNN和FEA通过PyTorch FEA库集成,我们的方法进一步将计算时间缩短至每个案例仅几秒钟。这项工作致力于通过开发一种端对端深度神经网络,能够直接从3D CT图像生成患者特定的主动脉有限元网格,以克服第一个障碍。
更新时间: 2025-10-08 04:00:46
领域: eess.IV,cs.CE,cs.CV,cs.LG
SafeGuider: Robust and Practical Content Safety Control for Text-to-Image Models
Text-to-image models have shown remarkable capabilities in generating high-quality images from natural language descriptions. However, these models are highly vulnerable to adversarial prompts, which can bypass safety measures and produce harmful content. Despite various defensive strategies, achieving robustness against attacks while maintaining practical utility in real-world applications remains a significant challenge. To address this issue, we first conduct an empirical study of the text encoder in the Stable Diffusion (SD) model, which is a widely used and representative text-to-image model. Our findings reveal that the [EOS] token acts as a semantic aggregator, exhibiting distinct distributional patterns between benign and adversarial prompts in its embedding space. Building on this insight, we introduce \textbf{SafeGuider}, a two-step framework designed for robust safety control without compromising generation quality. SafeGuider combines an embedding-level recognition model with a safety-aware feature erasure beam search algorithm. This integration enables the framework to maintain high-quality image generation for benign prompts while ensuring robust defense against both in-domain and out-of-domain attacks. SafeGuider demonstrates exceptional effectiveness in minimizing attack success rates, achieving a maximum rate of only 5.48\% across various attack scenarios. Moreover, instead of refusing to generate or producing black images for unsafe prompts, \textbf{SafeGuider} generates safe and meaningful images, enhancing its practical utility. In addition, SafeGuider is not limited to the SD model and can be effectively applied to other text-to-image models, such as the Flux model, demonstrating its versatility and adaptability across different architectures. We hope that SafeGuider can shed some light on the practical deployment of secure text-to-image systems.
Updated: 2025-10-08 04:00:39
标题: SafeGuider:用于文本到图像模型的稳健实用内容安全控制
摘要: 文本到图像模型已经展示出在从自然语言描述生成高质量图像方面的显著能力。然而,这些模型对于对抗性提示非常脆弱,这些提示可以绕过安全措施并产生有害内容。尽管有各种防御策略,但在保持实际应用中的实用性的同时实现对抗攻击的稳健性仍然是一个重大挑战。为了解决这个问题,我们首先对稳定扩散(SD)模型中的文本编码器进行了经验研究,这是一种广泛使用且代表性的文本到图像模型。我们的发现揭示了[EOS]标记作为语义聚合器,在其嵌入空间中在良性和对抗性提示之间展现出不同的分布模式。基于这一认识,我们引入了SafeGuider,这是一个为了稳健的安全控制而设计的两步框架,而不会影响生成质量。SafeGuider结合了一个嵌入级别的识别模型和一个安全感知的特征擦除波束搜索算法。这种集成使得该框架能够在良性提示下保持高质量图像生成,同时确保对领域内和领域外攻击的稳健防御。SafeGuider在最小化攻击成功率方面表现出卓越的效果,在各种攻击场景中仅达到5.48%的最大速率。此外,与拒绝为不安全提示生成或生成黑色图像不同,SafeGuider生成安全且有意义的图像,增强了其实际效用。此外,SafeGuider不局限于SD模型,可以有效应用于其他文本到图像模型,如Flux模型,展示了其跨不同架构的多功能性和适应性。我们希望SafeGuider能为安全的文本到图像系统的实际部署提供一些启示。
更新时间: 2025-10-08 04:00:39
领域: cs.CR,cs.AI,cs.CV,I.2
Conditional Local Independence Testing for Itô processes with Applications to Dynamic Causal Discovery
Inferring causal relationships from dynamical systems is the central interest of many scientific inquiries. Conditional local independence, which describes whether the evolution of one process is influenced by another process given additional processes, is important for causal learning in such systems. In this paper, we propose a hypothesis test for conditional local independence in It\^o processes. Our test is grounded in the semimartingale decomposition of the It\^o process, with which we introduce a stochastic integral process that is a martingale under the null hypothesis. We then apply a test for the martingale property, quantifying potential deviation from local independence. The test statistics is estimated using the optimal filtering equation. We show the consistency of the estimation, thereby establishing the level and power of our test. Numerical verification and a real-world application to causal discovery in brain resting-state fMRIs are conducted.
Updated: 2025-10-08 03:52:23
标题: Itô过程的条件局部独立性检验及其在动态因果发现中的应用
摘要: 从动力系统中推断因果关系是许多科学探究的核心兴趣。条件局部独立性描述了在给定额外过程的情况下,一个过程的演变是否受到另一个过程的影响,对于这类系统中的因果学习至关重要。在本文中,我们提出了一个假设检验,用于伊藤过程中的条件局部独立性。我们的测试基于伊藤过程的半鞍分解,通过引入在零假设下是鞅的随机积分过程。然后我们应用鞅性检验,量化局部独立性的潜在偏差。测试统计量是使用最优过滤方程估计的。我们展示了估计的一致性,从而确定了我们测试的水平和功效。我们进行了数值验证,并在大脑静息态fMRI中进行了真实世界应用,进行了因果发现。
更新时间: 2025-10-08 03:52:23
领域: stat.ME,cs.LG
Interpretable Clustering: A Survey
In recent years, much of the research on clustering algorithms has primarily focused on enhancing their accuracy and efficiency, frequently at the expense of interpretability. However, as these methods are increasingly being applied in high-stakes domains such as healthcare, finance, and autonomous systems, the need for transparent and interpretable clustering outcomes has become a critical concern. This is not only necessary for gaining user trust but also for satisfying the growing ethical and regulatory demands in these fields. Ensuring that decisions derived from clustering algorithms can be clearly understood and justified is now a fundamental requirement. To address this need, this paper provides a comprehensive and structured review of the current state of explainable clustering algorithms, identifying key criteria to distinguish between various methods. These insights can effectively assist researchers in making informed decisions about the most suitable explainable clustering methods for specific application contexts, while also promoting the development and adoption of clustering algorithms that are both efficient and transparent. For convenient access and reference, an open repository organizes representative and emerging interpretable clustering methods under the taxonomy proposed in this survey, available at https://github.com/hulianyu/Awesome-Interpretable-Clustering
Updated: 2025-10-08 03:50:33
标题: 可解释性聚类:综述
摘要: 近年来,对聚类算法的研究主要集中在提高其准确性和效率上,经常以解释性为代价。然而,随着这些方法越来越多地应用于医疗保健、金融和自主系统等高风险领域,透明和可解释的聚类结果的需求已经成为一个关键问题。这不仅是为了获得用户信任,还为了满足这些领域日益增长的伦理和监管要求。确保从聚类算法导出的决策能够清晰理解和证明现在是一个基本要求。为了满足这一需求,本文提供了对当前可解释性聚类算法的全面和结构化的评估,识别出区分各种方法的关键标准。这些见解可以有效地帮助研究人员在特定应用背景下做出明智的决策,同时促进既高效又透明的聚类算法的发展和采用。为了方便访问和参考,一个开放的存储库按照本调查提出的分类法组织了代表性和新兴的可解释性聚类方法,可在https://github.com/hulianyu/Awesome-Interpretable-Clustering找到。
更新时间: 2025-10-08 03:50:33
领域: cs.LG,cs.AI,J.0; I.5; I.2.6; I.2.4; H.1
Code Agent can be an End-to-end System Hacker: Benchmarking Real-world Threats of Computer-use Agent
Computer-use agent (CUA) frameworks, powered by large language models (LLMs) or multimodal LLMs (MLLMs), are rapidly maturing as assistants that can perceive context, reason, and act directly within software environments. Among their most critical applications is operating system (OS) control. As CUAs in the OS domain become increasingly embedded in daily operations, it is imperative to examine their real-world security implications, specifically whether CUAs can be misused to perform realistic, security-relevant attacks. Existing works exhibit four major limitations: Missing attacker-knowledge model on tactics, techniques, and procedures (TTP), Incomplete coverage for end-to-end kill chains, unrealistic environment without multi-host and encrypted user credentials, and unreliable judgment dependent on LLM-as-a-Judge. To address these gaps, we propose AdvCUA, the first benchmark aligned with real-world TTPs in MITRE ATT&CK Enterprise Matrix, which comprises 140 tasks, including 40 direct malicious tasks, 74 TTP-based malicious tasks, and 26 end-to-end kill chains, systematically evaluates CUAs under a realistic enterprise OS security threat in a multi-host environment sandbox by hard-coded evaluation. We evaluate the existing five mainstream CUAs, including ReAct, AutoGPT, Gemini CLI, Cursor CLI, and Cursor IDE based on 8 foundation LLMs. The results demonstrate that current frontier CUAs do not adequately cover OS security-centric threats. These capabilities of CUAs reduce dependence on custom malware and deep domain expertise, enabling even inexperienced attackers to mount complex enterprise intrusions, which raises social concern about the responsibility and security of CUAs.
Updated: 2025-10-08 03:35:23
标题: 代码代理可以成为端到端系统黑客:对计算机使用代理真实世界威胁的基准测试
摘要: 计算机使用代理(CUA)框架,由大型语言模型(LLMs)或多模态LLMs(MLLMs)提供支持,正迅速发展为可以在软件环境中感知上下文,推理并直接行动的助手。它们最重要的应用之一是操作系统(OS)控制。随着OS领域中的CUAs越来越多地嵌入到日常操作中,有必要检查它们在现实世界安全方面的影响,特别是CUAs是否可以被滥用来执行现实且与安全相关的攻击。现有研究存在四个主要限制:缺少有关攻击者战术、技术和程序(TTP)的知识模型,对端到端杀链的覆盖不完整,没有多主机和加密用户凭证的不切实际环境,以及依赖LLM作为判断的不可靠评估。为了解决这些空白,我们提出了AdvCUA,与MITRE ATT&CK企业矩阵中的真实TTPs相一致的第一个基准,包括140个任务,其中包括40个直接恶意任务,74个基于TTP的恶意任务和26个端到端杀链,在一个真实的企业OS安全威胁下,通过硬编码评估在多主机环境沙盒中系统评估CUAs。我们评估了现有的五种主流CUAs,包括ReAct,AutoGPT,Gemini CLI,Cursor CLI和Cursor IDE,基于8个基础LLMs。结果表明,当前的前沿CUAs并不充分涵盖OS安全中心威胁。这些CUAs的能力降低了对定制恶意软件和深度领域专业知识的依赖,甚至使经验不足的攻击者能够发动复杂的企业入侵,这引发了社会对CUAs的责任和安全性的担忧。
更新时间: 2025-10-08 03:35:23
领域: cs.CR
PolyKAN: A Polyhedral Analysis Framework for Provable and Approximately Optimal KAN Compression
Kolmogorov-Arnold Networks (KANs) have emerged as a promising alternative to traditional Multi-Layer Perceptrons (MLPs), offering enhanced interpretability and a solid mathematical foundation. However, their parameter efficiency remains a significant challenge for practical deployment. This paper introduces PolyKAN, a novel theoretical framework for KAN compression that provides formal guarantees on both model size reduction and approximation error. By leveraging the inherent piecewise polynomial structure of KANs, we formulate the compression problem as a polyhedral region merging task. We establish a rigorous polyhedral characterization of KANs, develop a complete theory of $\epsilon$-equivalent compression, and design a dynamic programming algorithm that achieves approximately optimal compression under specified error bounds. Our theoretical analysis demonstrates that PolyKAN achieves provably near-optimal compression while maintaining strict error control, with guaranteed global optimality for univariate spline functions. This framework provides the first formal foundation for KAN compression with mathematical guarantees, opening new directions for the efficient deployment of interpretable neural architectures.
Updated: 2025-10-08 03:27:57
标题: PolyKAN:一种基于多面体分析的可证明和近似最优KAN压缩框架
摘要: 科尔莫哥洛夫-阿诺德网络(KANs)已经成为传统多层感知器(MLPs)的一种有前途的替代方案,提供了增强的可解释性和坚实的数学基础。然而,它们的参数效率仍然是实际部署面临的重大挑战。本文介绍了PolyKAN,这是一种针对KAN压缩的新型理论框架,它提供了关于模型大小减小和逼近误差的正式保证。通过利用KANs固有的分段多项式结构,我们将压缩问题形式化为一个多面体区域合并任务。我们建立了KANs的严格多面体特征化,发展了一个完整的$\epsilon$-等价压缩理论,并设计了一个动态规划算法,在指定的误差界限下实现近乎最优的压缩。我们的理论分析表明,PolyKAN实现了可证近乎最优的压缩,同时保持严格的误差控制,对于单变量样条函数具有全局最优性的保证。这个框架为KAN压缩提供了第一个具有数学保证的正式基础,为高效部署可解释的神经结构开辟了新的方向。
更新时间: 2025-10-08 03:27:57
领域: cs.LG,cs.AI,cs.NA,math.NA,math.OC,68T07, 41A15, 52B11,F.2.2; G.1.2; I.2.6
Reading Between the Lines: Towards Reliable Black-box LLM Fingerprinting via Zeroth-order Gradient Estimation
The substantial investment required to develop Large Language Models (LLMs) makes them valuable intellectual property, raising significant concerns about copyright protection. LLM fingerprinting has emerged as a key technique to address this, which aims to verify a model's origin by extracting an intrinsic, unique signature (a "fingerprint") and comparing it to that of a source model to identify illicit copies. However, existing black-box fingerprinting methods often fail to generate distinctive LLM fingerprints. This ineffectiveness arises because black-box methods typically rely on model outputs, which lose critical information about the model's unique parameters due to the usage of non-linear functions. To address this, we first leverage Fisher Information Theory to formally demonstrate that the gradient of the model's input is a more informative feature for fingerprinting than the output. Based on this insight, we propose ZeroPrint, a novel method that approximates these information-rich gradients in a black-box setting using zeroth-order estimation. ZeroPrint overcomes the challenge of applying this to discrete text by simulating input perturbations via semantic-preserving word substitutions. This operation allows ZeroPrint to estimate the model's Jacobian matrix as a unique fingerprint. Experiments on the standard benchmark show ZeroPrint achieves a state-of-the-art effectiveness and robustness, significantly outperforming existing black-box methods.
Updated: 2025-10-08 03:27:38
标题: 在行间阅读:通过零阶梯度估算实现可靠的黑盒LLM指纹识别
摘要: 大规模语言模型(LLMs)的开发需要大量投资,使它们成为有价值的知识产权,引发了对版权保护的重大关注。LLM指纹识别已经成为解决这一问题的关键技术,旨在通过提取一个内在的、独特的签名(“指纹”)并将其与源模型的签名进行比较来验证模型的来源,以识别非法拷贝。然而,现有的黑盒指纹识别方法通常无法生成独特的LLM指纹。这种无效性是由于黑盒方法通常依赖于模型输出,这些输出由于使用非线性函数而丢失了有关模型独特参数的关键信息。为了解决这个问题,我们首先利用费舍尔信息理论正式证明了模型输入的梯度比输出更具信息量,更适合用于指纹识别。基于这一见解,我们提出了ZeroPrint,一种利用零阶估计在黑盒设置中近似这些信息丰富梯度的新方法。ZeroPrint通过模拟语义保持的词替换来处理离散文本的输入扰动。这个操作使ZeroPrint能够估计模型的雅可比矩阵作为独特的指纹。在标准基准测试上的实验表明,ZeroPrint取得了最先进的效果和稳健性,明显优于现有的黑盒方法。
更新时间: 2025-10-08 03:27:38
领域: cs.CR,cs.AI,cs.CL
Inoculation Prompting: Instructing LLMs to misbehave at train-time improves test-time alignment
Large language models are sometimes trained with imperfect oversight signals, leading to undesired behaviors such as reward hacking and sycophancy. Improving oversight quality can be expensive or infeasible, motivating methods that improve learned behavior despite an imperfect training signal. We introduce Inoculation Prompting (IP), a simple but counterintuitive technique that prevents learning of an undesired behavior by modifying training prompts to explicitly request it. For example, to inoculate against reward hacking, we modify the prompts used in supervised fine-tuning to request code that only works on provided test cases but fails on other inputs. Across four settings we find that IP reduces the learning of undesired behavior without substantially reducing the learning of desired capabilities. We also show that prompts which more strongly elicit the undesired behavior prior to fine-tuning more effectively inoculate against the behavior when used during training; this serves as a heuristic to identify promising inoculation prompts. Overall, IP is a simple yet effective way to control how models generalize from fine-tuning, preventing learning of undesired behaviors without substantially disrupting desired capabilities.
Updated: 2025-10-08 03:13:07
标题: 接种提示:在火车时间教导LLMs在测试时间对齐时表现不端
摘要: 大型语言模型有时会在缺乏完美监督信号的情况下进行训练,导致不良行为,例如操纵奖励和谄媚。改善监督质量可能昂贵或不可行,这促使我们提出了一种方法,即改进学习行为,尽管训练信号不完美。我们引入了免疫提示(IP),这是一种简单但反直觉的技术,通过修改训练提示来明确要求它,以防止学习不良行为。例如,为了预防奖励操纵,我们修改了监督微调中使用的提示,以要求只适用于提供的测试用例但在其他输入上失败的代码。在四种设置中,我们发现IP减少了不良行为的学习,而不会显著减少所需能力的学习。我们还展示,在微调之前更强烈引发不良行为的提示,在训练过程中更有效地预防该行为;这可以作为一种启发式方法来识别有前途的接种提示。总的来说,IP是一种简单而有效的方式,可以控制模型从微调中的泛化,防止学习不良行为,而不会显著破坏所需的能力。
更新时间: 2025-10-08 03:13:07
领域: cs.LG
SDQM: Synthetic Data Quality Metric for Object Detection Dataset Evaluation
The performance of machine learning models depends heavily on training data. The scarcity of large-scale, well-annotated datasets poses significant challenges in creating robust models. To address this, synthetic data generated through simulations and generative models has emerged as a promising solution, enhancing dataset diversity and improving the performance, reliability, and resilience of models. However, evaluating the quality of this generated data requires an effective metric. This paper introduces the Synthetic Dataset Quality Metric (SDQM) to assess data quality for object detection tasks without requiring model training to converge. This metric enables more efficient generation and selection of synthetic datasets, addressing a key challenge in resource-constrained object detection tasks. In our experiments, SDQM demonstrated a strong correlation with the mean Average Precision (mAP) scores of YOLOv11, a leading object detection model, while previous metrics only exhibited moderate or weak correlations. Additionally, it provides actionable insights for improving dataset quality, minimizing the need for costly iterative training. This scalable and efficient metric sets a new standard for evaluating synthetic data. The code for SDQM is available at https://github.com/ayushzenith/SDQM
Updated: 2025-10-08 03:01:26
标题: SDQM:用于目标检测数据集评估的合成数据质量度量
摘要: 机器学习模型的性能在很大程度上取决于训练数据。缺乏大规模、良好注释的数据集在创建稳健模型方面存在重大挑战。为解决这一问题,通过模拟和生成模型生成的合成数据已成为一种有前途的解决方案,增强了数据集的多样性,提高了模型的性能、可靠性和韧性。然而,评估这些生成数据的质量需要一个有效的度量标准。本文介绍了合成数据集质量度量(SDQM),用于评估目标检测任务的数据质量,而无需模型训练达到收敛。该度量标准使得更高效地生成和选择合成数据集,解决了资源受限的目标检测任务中的一个关键挑战。在我们的实验中,SDQM与领先的目标检测模型YOLOv11的平均精度(mAP)得分呈强相关,而之前的度量标准仅呈现出中等或较弱的相关性。此外,它提供了改善数据集质量的可行见解,最大程度地减少了昂贵的迭代训练需求。这种可扩展和高效的度量标准为评估合成数据设定了新的标准。SDQM的代码可在https://github.com/ayushzenith/SDQM找到。
更新时间: 2025-10-08 03:01:26
领域: cs.CV,cs.AI,cs.IT,cs.LG,math.IT
ACT-Tensor: Tensor Completion Framework for Financial Dataset Imputation
Missing data in financial panels presents a critical obstacle, undermining asset-pricing models and reducing the effectiveness of investment strategies. Such panels are often inherently multi-dimensional, spanning firms, time, and financial variables, which adds complexity to the imputation task. Conventional imputation methods often fail by flattening the data's multidimensional structure, struggling with heterogeneous missingness patterns, or overfitting in the face of extreme data sparsity. To address these limitations, we introduce an Adaptive, Cluster-based Temporal smoothing tensor completion framework (ACT-Tensor) tailored for severely and heterogeneously missing multi-dimensional financial data panels. ACT-Tensor incorporates two key innovations: a cluster-based completion module that captures cross-sectional heterogeneity by learning group-specific latent structures; and a temporal smoothing module that proactively removes short-lived noise while preserving slow-moving fundamental trends. Extensive experiments show that ACT-Tensor consistently outperforms state-of-the-art benchmarks in terms of imputation accuracy across a range of missing data regimes, including extreme sparsity scenarios. To assess its practical financial utility, we evaluate the imputed data with an asset-pricing pipeline tailored for tensor-structured financial data. Results show that ACT-Tensor not only reduces pricing errors but also significantly improves risk-adjusted returns of the constructed portfolio. These findings confirm that our method delivers highly accurate and informative imputations, offering substantial value for financial decision-making.
Updated: 2025-10-08 02:59:25
标题: ACT-Tensor: 金融数据填充的张量完成框架
摘要: 金融面板数据中的缺失数据构成了一个重要障碍,破坏了资产定价模型,并降低了投资策略的有效性。这种面板通常固有地是多维的,涵盖了公司、时间和金融变量,这给填补数据的任务增加了复杂性。传统的填补方法经常失败,因为它们会使数据的多维结构变得扁平化,难以应对异质性缺失模式,或在极端数据稀疏的情况下过拟合。为了解决这些限制,我们引入了一种适用于严重和异质缺失多维金融数据面板的自适应、基于聚类的时间平滑张量完成框架(ACT-Tensor)。ACT-Tensor包含两个关键创新:一个基于聚类的完成模块,通过学习特定组的潜在结构来捕捉横截面异质性;以及一个时间平滑模块,主动去除短暂的噪声,同时保留缓慢移动的基本趋势。大量实验证明,ACT-Tensor在填补精度方面始终优于最先进的基准,包括极端稀疏场景。为了评估其实际的金融效用,我们使用了专为张量结构金融数据设计的资产定价流程来评估填补的数据。结果显示,ACT-Tensor不仅减少了定价误差,还显著提高了构建投资组合的风险调整收益。这些发现证实了我们的方法提供了高度准确和信息丰富的填补,为金融决策提供了实质性价值。
更新时间: 2025-10-08 02:59:25
领域: stat.AP,cs.AI,cs.LG
Nonparametric Bellman Mappings for Value Iteration in Distributed Reinforcement Learning
This paper introduces novel Bellman mappings (B-Maps) for value iteration (VI) in distributed reinforcement learning (DRL), where agents are deployed over an undirected, connected graph/network with arbitrary topology -- but without a centralized node, that is, a node capable of aggregating all data and performing computations. Each agent constructs a nonparametric B-Map from its private data, operating on Q-functions represented in a reproducing kernel Hilbert space, with flexibility in choosing the basis for their representation. Agents exchange their Q-function estimates only with direct neighbors, and unlike existing DRL approaches that restrict communication to Q-functions, the proposed framework also enables the transmission of basis information in the form of covariance matrices, thereby conveying additional structural details. Linear convergence rates are established for both Q-function and covariance-matrix estimates toward their consensus values, regardless of the network topology, with optimal learning rates determined by the ratio of the smallest positive eigenvalue (the graph's Fiedler value) to the largest eigenvalue of the graph Laplacian matrix. A detailed performance analysis further shows that the proposed DRL framework effectively approximates the performance of a centralized node, had such a node existed. Numerical tests on two benchmark control problems confirm the effectiveness of the proposed nonparametric B-Maps relative to prior methods. Notably, the tests reveal a counter-intuitive outcome: although the framework involves richer information exchange -- specifically through transmitting covariance matrices as basis information -- it achieves the desired performance at a lower cumulative communication cost than existing DRL schemes, underscoring the critical role of sharing basis information in accelerating the learning process.
Updated: 2025-10-08 02:54:52
标题: 非参数贝尔曼映射在分布式强化学习中的值迭代
摘要: 本文介绍了一种用于分布式强化学习(DRL)中值迭代的新型贝尔曼映射(B-Maps)。在这种情况下,代理被部署在一个无向连接图/网络上,具有任意拓扑结构,但没有一个集中节点,即一个能够聚合所有数据并执行计算的节点。每个代理从其私有数据构建一个非参数B-Map,用于在再生核希尔伯特空间中表示的Q函数上操作,并可以灵活选择其表示的基础。代理仅与直接邻居交换其Q函数估计,与现有的限制通信仅限于Q函数的DRL方法不同,所提出的框架还能够传输协方差矩阵的基础信息,从而传达额外的结构细节。针对Q函数和协方差矩阵估计向其一致值的线性收敛速率建立,不受网络拓扑的影响,最佳学习速率由图拉普拉斯矩阵的最小正特征值(图的菲德勒值)与最大特征值之比确定。进一步的详细性能分析显示,所提出的DRL框架有效地近似了一个集中节点的性能,如果存在这样一个节点的话。对两个基准控制问题进行的数值测试证实了所提出的非参数B-Maps相对于先前方法的有效性。值得注意的是,测试结果显示一个令人费解的结果:尽管该框架涉及更丰富的信息交换,特别是通过传递协方差矩阵作为基础信息,但它以比现有的DRL方案更低的累计通信成本实现了所需的性能,强调了在加速学习过程中共享基础信息的关键作用。
更新时间: 2025-10-08 02:54:52
领域: cs.LG,eess.SP
Learning where to learn: Training data distribution optimization for scientific machine learning
In scientific machine learning, models are routinely deployed with parameter values or boundary conditions far from those used in training. This paper studies the learning-where-to-learn problem of designing a training data distribution that minimizes average prediction error across a family of deployment regimes. A theoretical analysis shows how the training distribution shapes deployment accuracy. This motivates two adaptive algorithms based on bilevel or alternating optimization in the space of probability measures. Discretized implementations using parametric distribution classes or nonparametric particle-based gradient flows deliver optimized training distributions that outperform nonadaptive designs. Once trained, the resulting models exhibit improved sample complexity and robustness to distribution shift. This framework unlocks the potential of principled data acquisition for learning functions and solution operators of partial differential equations.
Updated: 2025-10-08 02:51:54
标题: 学习何处学习:科学机器学习的训练数据分布优化
摘要: 在科学机器学习中,模型通常以远离训练中使用的参数值或边界条件部署。本文研究了设计训练数据分布以最小化一组部署方案中的平均预测误差的学习何时学习问题。理论分析显示训练分布如何塑造部署精度。这激发了基于双层或交替优化的自适应算法,这些算法在概率测度空间中进行。使用参数分布类或非参数粒子梯度流的离散化实现提供了优化训练分布,优于非自适应设计。一旦训练完成,所得到的模型表现出改进的样本复杂度和对分布偏移的鲁棒性。这个框架释放了有原则的数据采集潜力,用于学习偏微分方程的函数和解算子。
更新时间: 2025-10-08 02:51:54
领域: cs.LG,math.OC,stat.ML,62K05, 65K10 (Primary) 68T07, 65D15, 62R20, 60G57 (Secondary)
MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks
Multimodal large language models with Retrieval Augmented Generation (RAG) have significantly advanced tasks such as multimodal question answering by grounding responses in external text and images. This grounding improves factuality, reduces hallucination, and extends reasoning beyond parametric knowledge. However, this reliance on external knowledge poses a critical yet underexplored safety risk: knowledge poisoning attacks, where adversaries deliberately inject adversarial multimodal content into external knowledge bases to steer model toward generating incorrect or even harmful responses. To expose such vulnerabilities, we propose MM-PoisonRAG, the first framework to systematically design knowledge poisoning in multimodal RAG. We introduce two complementary attack strategies: Localized Poisoning Attack (LPA), which implants targeted multimodal misinformation to manipulate specific queries, and Globalized Poisoning Attack (GPA), which inserts a single adversarial knowledge to broadly disrupt reasoning and induce nonsensical responses across all queries. Comprehensive experiments across tasks, models, and access settings show that LPA achieves targeted manipulation with attack success rates of up to 56%, while GPA completely disrupts model generation to 0% accuracy with just a single adversarial knowledge injection. Our results reveal the fragility of multimodal RAG and highlight the urgent need for defenses against knowledge poisoning.
Updated: 2025-10-08 02:51:51
标题: MM-PoisonRAG: 用本地和全局毒害攻击破坏多模式RAG
摘要: 多模态大型语言模型与检索增强生成(RAG)在多模态问题回答等任务方面取得了显著进展,通过将响应基于外部文本和图像进行接地。这种接地改善了事实性,减少了幻觉,并将推理扩展到参数化知识之外。然而,这种对外部知识的依赖构成了一个关键但尚未被充分探讨的安全风险:知识毒害攻击,即对手故意向外部知识库注入敌对的多模态内容,以引导模型生成不正确甚至有害的响应。为了揭示这种脆弱性,我们提出了MM-PoisonRAG,这是第一个系统设计多模态RAG中知识毒害的框架。我们引入了两种互补的攻击策略:局部毒害攻击(LPA),用于植入有针对性的多模态错误信息来操纵特定查询,以及全局毒害攻击(GPA),用于插入单个敌对知识以广泛干扰推理并在所有查询中引发荒谬的响应。跨任务、模型和访问设置的全面实验表明,LPA实现了高达56%的攻击成功率的有针对性操纵,而GPA仅通过一次敌对知识注入就将模型生成完全破坏到0%的准确率。我们的结果揭示了多模态RAG的脆弱性,并强调了对抗知识毒害的紧迫需要。
更新时间: 2025-10-08 02:51:51
领域: cs.LG,cs.AI,cs.CR,cs.CV
CAPO: Towards Enhancing LLM Reasoning through Generative Credit Assignment
Reinforcement Learning with Verifiable Rewards (RLVR) has improved the reasoning abilities of Large Language Models (LLMs) by using rule-based binary feedback. However, current RLVR methods typically assign the same reward to every token. This coarse-grained feedback hampers precise credit assignment, making it hard for models to identify which reasoning steps lead to success or failure, and often results in suboptimal policies. Methods like PPO provide credit assignment by value estimation, but yield inaccurate and unverifiable signals due to limited sampling. On the other hand, methods using Process Reward Models can provide step-wise rewards but suffer from several key limitations: they require high-quality process supervision labels, the feedback is unreliable due to probabilistic reward modeling, and their application in online reinforcement learning (RL) is time-consuming. To overcome these limitations, we introduce a simple but efficient method-Credit Assignment Policy Optimization (CAPO). Instead of training auxiliary models, CAPO directly leverages an off-the-shelf, general-purpose LLM as a Generative Process Reward Model (LLM-as-GenPRM) to generate all step-wise critique by one pass only based on the correctness of the step itself, providing deterministic token-level credits to refine the tokens that were originally assigned identical rule-based rewards. To further enhance the accuracy and robustness, we employ voting mechanisms that scale with the number of generated critiques. Extensive experiments on various backbones like Llama and Qwen models show that CAPO consistently outperforms supervised learning-based and RL-based fine-tuning methods across four challenging mathematical benchmarks and three out-of-domain benchmarks. Further analysis shows that CAPO can help the model to foster the learning of correct reasoning pathways leading to correct answers.
Updated: 2025-10-08 02:10:47
标题: CAPO: 通过生成式信用分配增强LLM推理
摘要: 使用可验证奖励的强化学习(RLVR)通过使用基于规则的二进制反馈,提高了大型语言模型(LLMs)的推理能力。然而,当前的RLVR方法通常将相同的奖励分配给每个令牌。这种粗粒度的反馈阻碍了精确的信用分配,使模型难以确定哪些推理步骤导致成功或失败,并且通常导致次优策略。像PPO这样的方法通过值估计提供信用分配,但由于采样有限,产生不准确和不可验证的信号。另一方面,使用过程奖励模型的方法可以提供逐步奖励,但存在一些关键限制:它们需要高质量的过程监督标签,反馈由于概率奖励建模而不可靠,并且它们在在线强化学习(RL)中的应用耗时。为了克服这些限制,我们引入了一种简单但高效的方法-信用分配策略优化(CAPO)。 CAPO直接利用现成的通用LLM作为生成过程奖励模型(LLM-as-GenPRM),仅基于步骤本身的正确性一次传递生成所有逐步批评,提供确定性的令牌级信用,以改进最初分配相同基于规则的奖励的令牌。为了进一步增强准确性和鲁棒性,我们采用随生成批评数量增加而扩展的投票机制。对LLama和Qwen等不同基础模型进行的广泛实验表明,CAPO在四个具有挑战性的数学基准和三个域外基准上持续优于基于监督学习和RL的微调方法。进一步分析显示,CAPO可以帮助模型培养正确推理路径的学习,从而得出正确答案。
更新时间: 2025-10-08 02:10:47
领域: cs.LG,cs.AI,cs.CL
Scalable In-context Ranking with Generative Models
In-context Ranking (ICR) is an emerging paradigm for Information Retrieval (IR), which leverages contextual understanding of LLMs by directly incorporating the task description, candidate documents, and the query into the model's input prompt and tasking the LLM to identify relevant document(s). While it is effective, efficiency is a significant challenge in this paradigm, especially as the candidate list grows due to quadratic/super-linear scaling of attention operation with context length. To this end, this paper first identifies inherent and exploitable structures in the attention of LLMs finetuned for ICR: (1) inter-document block sparsity: attention is dense within each document block but sparse across different documents in the context; and (2) query-document block relevance: the attention scores from certain query tokens to a document block in middle layers strongly correlate with that document's actual relevance. Motivated by these observations, we introduce BlockRank (Blockwise In-context Ranking), a novel method that adapts the attention operation in an LLM by (a) architecturally enforcing the observed inter-document block sparsity, reducing attention complexity from quadratic to linear without loss in performance, and (b) optimizing query-document block relevance for true relevant documents during fine-tuning using an auxiliary contrastive training objective, improving retrieval in attention. Experiments on BEIR, MSMarco and NQ with Mistral-7B demonstrate that BlockRank Mistral matches or outperforms existing SOTA listwise rankers and controlled fine-tuned baseline while being significantly more efficient at inference (4.7x for 100 MSMarco documents in context) and scaling gracefully to long-context shortlists, around 500 documents in-context (approximately 100K context length) within a second, presenting a scalable and effective solution for ICR.
Updated: 2025-10-08 02:02:37
标题: 可扩展的上下文排名与生成模型
摘要: In-context Ranking(ICR)是信息检索(IR)的一种新兴范式,通过直接将任务描述、候选文档和查询整合到模型的输入提示中,利用LLM的上下文理解能力,要求LLM识别相关文档。尽管这种方法很有效,但效率是这种范式中的一个重要挑战,特别是当候选列表因上下文长度而呈二次/超线性增长时。因此,本文首先确定了LLM在ICR中微调时注意力中固有且可利用的结构:(1)文档块间的稀疏性:在每个文档块内注意力密集,但在上下文中不同文档之间稀疏;(2)查询-文档块相关性:某些查询标记到中间层文档块的注意力分数与该文档的实际相关性强相关。受到这些观察的启发,我们引入了BlockRank(基于块的上下文排名),这是一种通过(a)在LLM中强制执行观察到的文档块间的稀疏性,将注意力复杂度从二次降低到线性而不影响性能,并(b)使用辅助对比训练目标在微调过程中优化查询-文档块相关性以提高检索效率的新方法。在BEIR、MSMarco和NQ上的实验结果表明,BlockRank Mistral与现有的SOTA列表式排名器相匹配或表现更好,并且在推理过程中效率显著提高(在上下文中有100个MSMarco文档时提高了4.7倍),并且能够良好地扩展到长上下文的短列表,大约在一秒钟内处理500个文档(大约100K上下文长度),为ICR提供了可扩展且有效的解决方案。
更新时间: 2025-10-08 02:02:37
领域: cs.IR,cs.LG
Keep It on a Leash: Controllable Pseudo-label Generation Towards Realistic Long-Tailed Semi-Supervised Learning
Current long-tailed semi-supervised learning methods assume that labeled data exhibit a long-tailed distribution, and unlabeled data adhere to a typical predefined distribution (i.e., long-tailed, uniform, or inverse long-tailed). However, the distribution of the unlabeled data is generally unknown and may follow an arbitrary distribution. To tackle this challenge, we propose a Controllable Pseudo-label Generation (CPG) framework, expanding the labeled dataset with the progressively identified reliable pseudo-labels from the unlabeled dataset and training the model on the updated labeled dataset with a known distribution, making it unaffected by the unlabeled data distribution. Specifically, CPG operates through a controllable self-reinforcing optimization cycle: (i) at each training step, our dynamic controllable filtering mechanism selectively incorporates reliable pseudo-labels from the unlabeled dataset into the labeled dataset, ensuring that the updated labeled dataset follows a known distribution; (ii) we then construct a Bayes-optimal classifier using logit adjustment based on the updated labeled data distribution; (iii) this improved classifier subsequently helps identify more reliable pseudo-labels in the next training step. We further theoretically prove that this optimization cycle can significantly reduce the generalization error under some conditions. Additionally, we propose a class-aware adaptive augmentation module to further improve the representation of minority classes, and an auxiliary branch to maximize data utilization by leveraging all labeled and unlabeled samples. Comprehensive evaluations on various commonly used benchmark datasets show that CPG achieves consistent improvements, surpassing state-of-the-art methods by up to $\textbf{15.97%}$ in accuracy. The code is available at https://github.com/yaxinhou/CPG.
Updated: 2025-10-08 01:59:22
标题: 让它保持受控:可控伪标签生成朝向逼真的长尾半监督学习
摘要: 目前的长尾半监督学习方法假设有标记数据呈现长尾分布,而无标记数据遵循典型的预定义分布(即长尾、均匀或反向长尾)。然而,无标记数据的分布通常是未知的,可能遵循任意分布。为了解决这一挑战,我们提出了一个可控伪标签生成(CPG)框架,通过从无标记数据中逐渐确定可靠的伪标签扩展带标记数据集,并在已知分布的更新标记数据集上训练模型,使其不受无标记数据分布的影响。具体来说,CPG通过一个可控的自我强化优化循环运作:(i)在每个训练步骤中,我们的动态可控过滤机制选择性地将可靠的伪标签从无标记数据集中整合到带标记数据集中,确保更新后的标记数据集遵循已知分布;(ii)然后我们基于更新后的标记数据分布构建一个贝叶斯最优分类器进行逻辑调整;(iii)这个改进的分类器随后有助于在下一个训练步骤中识别更可靠的伪标签。我们进一步理论上证明这种优化循环在某些条件下可以显著降低泛化误差。此外,我们提出了一个类感知自适应增强模块来进一步改善少数类的表示,并通过利用所有带标记和无标记样本来最大化数据利用率的辅助分支。对各种常用基准数据集的全面评估表明,CPG取得了一致的改进,准确率超过现有方法高达15.97%。代码可在https://github.com/yaxinhou/CPG找到。
更新时间: 2025-10-08 01:59:22
领域: cs.CV,cs.LG
DACP: Domain-Adaptive Continual Pre-Training of Large Language Models for Phone Conversation Summarization
Large language models (LLMs) have achieved impressive performance in text summarization, yet their performance often falls short when applied to specialized domains that differ from their original pre-training distribution. While fine-tuning can improve summarization quality, it typically relies on costly and scarce high-quality labeled data. In this work, we explore continual pre-training as a scalable, self-supervised approach to adapt LLMs for downstream summarization tasks, particularly in the context of noisy real-world conversation transcripts. We conduct extensive experiments using large-scale, unlabeled business conversation data to investigate whether continual pre-training enhances model capabilities in conversational summarization. Our results demonstrate that continual pre-training yields substantial gains in both in-domain and out-of-domain summarization benchmarks, while maintaining strong generalization and robustness. We also analyze the effects of data selection strategies, providing practical guidelines for applying continual pre-training in summarization-focused industrial applications.
Updated: 2025-10-08 01:55:53
标题: DACP:面向电话对话摘要的大型语言模型领域自适应持续预训练
摘要: 大型语言模型(LLMs)在文本摘要方面取得了令人印象深刻的表现,但当应用于与其原始预训练分布不同的专业领域时,它们的性能往往不尽人意。虽然微调可以提高摘要质量,但通常依赖于昂贵且稀缺的高质量标记数据。在这项工作中,我们探讨了持续预训练作为一种可扩展的自监督方法,用于适应LLMs进行下游摘要任务,特别是在嘈杂的现实对话转录的情况下。我们使用大规模未标记的商业对话数据进行广泛实验,以探讨持续预训练是否增强了模型在对话摘要中的能力。我们的结果表明,持续预训练在领域内和领域外的摘要基准测试中都取得了显著收益,同时保持了强大的泛化能力和鲁棒性。我们还分析了数据选择策略的影响,为在面向摘要的工业应用中应用持续预训练提供了实用指导。
更新时间: 2025-10-08 01:55:53
领域: cs.CL,cs.AI,cs.LG
A Minimalist Bayesian Framework for Stochastic Optimization
The Bayesian paradigm offers principled tools for sequential decision-making under uncertainty, but its reliance on a probabilistic model for all parameters can hinder the incorporation of complex structural constraints. We introduce a minimalist Bayesian framework that places a prior only on the component of interest, such as the location of the optimum. Nuisance parameters are eliminated via profile likelihood, which naturally handles constraints. As a direct instantiation, we develop a MINimalist Thompson Sampling (MINTS) algorithm. Our framework accommodates structured problems, including continuum-armed Lipschitz bandits and dynamic pricing. It also provides a probabilistic lens on classical convex optimization algorithms such as the center of gravity and ellipsoid methods. We further analyze MINTS for multi-armed bandits and establish near-optimal regret guarantees.
Updated: 2025-10-08 01:52:40
标题: 一个极简的贝叶斯框架用于随机优化
摘要: 贝叶斯范式提供了在不确定性下进行顺序决策的原则性工具,但其对所有参数都依赖于概率模型可能会阻碍复杂结构约束的整合。我们引入了一个极简主义贝叶斯框架,仅在感兴趣的组件上放置先验,比如最优位置。通过剖面似然性,可以消除干扰参数,这自然地处理约束。作为直接实例化,我们开发了一种MINimalist Thompson Sampling(MINTS)算法。我们的框架适用于结构化问题,包括连续臂Lipschitz赌徒和动态定价。它还为经典的凸优化算法提供了概率视角,如重心和椭球方法。我们进一步分析了MINTS在多臂赌徒问题中的表现,并建立了接近最优的遗憾保证。
更新时间: 2025-10-08 01:52:40
领域: cs.LG,cs.AI,math.OC,stat.ML
An Investigation of Robustness of LLMs in Mathematical Reasoning: Benchmarking with Mathematically-Equivalent Transformation of Advanced Mathematical Problems
In this paper, we introduce a systematic framework beyond conventional method to assess LLMs' mathematical-reasoning robustness by stress-testing them on advanced math problems that are mathematically equivalent but with linguistic and parametric variation. These transformations allow us to measure the sensitivity of LLMs to non-mathematical perturbations, thereby enabling a more accurate evaluation of their mathematical reasoning capabilities. Using this new evaluation methodology, we created PutnamGAP, a new benchmark dataset with multiple mathematically-equivalent variations of competition-level math problems. With the new dataset, we evaluate multiple families of representative LLMs and examine their robustness. Across 18 commercial and open-source models we observe sharp performance degradation on the variants. OpenAI's flagship reasoning model, O3, scores 51.5% on the originals but drops by 4.7 percentage points on surface-renaming variants, and by 12.9 percentage points on parametric variants, while smaller models fare far worse. Overall, the results show that the proposed new evaluation methodology is effective for deepening our understanding of the robustness of LLMs and generating new insights for further improving their mathematical reasoning capabilities.
Updated: 2025-10-08 01:46:12
标题: 对LLMs在数学推理中的鲁棒性进行调查:与高级数学问题的数学等效变换进行基准测试
摘要: 在这篇论文中,我们引入了一个超越传统方法的系统框架,通过在高级数学问题上对LLMs进行应力测试,以评估它们的数学推理鲁棒性。这些转换使我们能够衡量LLMs对非数学扰动的敏感性,从而更准确地评估它们的数学推理能力。利用这种新的评估方法,我们创建了PutnamGAP,一个新的基准数据集,其中包含多个在竞赛水平数学问题上具有数学等效变体。通过新数据集,我们评估了多个代表性LLMs家族,并检验它们的鲁棒性。在18个商业和开源模型中,我们观察到在变体上的性能急剧下降。OpenAI的旗舰推理模型O3在原始问题上得分为51.5%,但在表面重命名变体上下降了4.7个百分点,在参数变体上下降了12.9个百分点,而较小的模型表现得更差。总体而言,结果表明所提出的新评估方法对加深我们对LLMs鲁棒性的理解并为进一步改进它们的数学推理能力提供了新的见解。
更新时间: 2025-10-08 01:46:12
领域: cs.CL,cs.AI,cs.LG
The Framework That Survives Bad Models: Human-AI Collaboration For Clinical Trials
Artificial intelligence (AI) holds great promise for supporting clinical trials, from patient recruitment and endpoint assessment to treatment response prediction. However, deploying AI without safeguards poses significant risks, particularly when evaluating patient endpoints that directly impact trial conclusions. We compared two AI frameworks against human-only assessment for medical image-based disease evaluation, measuring cost, accuracy, robustness, and generalization ability. To stress-test these frameworks, we injected bad models, ranging from random guesses to naive predictions, to ensure that observed treatment effects remain valid even under severe model degradation. We evaluated the frameworks using two randomized controlled trials with endpoints derived from spinal X-ray images. Our findings indicate that using AI as a supporting reader (AI-SR) is the most suitable approach for clinical trials, as it meets all criteria across various model types, even with bad models. This method consistently provides reliable disease estimation, preserves clinical trial treatment effect estimates and conclusions, and retains these advantages when applied to different populations.
Updated: 2025-10-08 01:40:41
标题: 生存不良模型的框架:临床试验中的人工智能与人类协作
摘要: 人工智能(AI)在支持临床试验方面具有巨大的潜力,从患者招募和终点评估到治疗反应预测。然而,没有保障地部署人工智能存在重大风险,特别是在评估直接影响试验结论的患者终点时。我们将两种人工智能框架与仅人类评估进行比较,用于基于医学图像的疾病评估,衡量成本、准确性、鲁棒性和泛化能力。为了对这些框架进行压力测试,我们注入了从随机猜测到天真预测的糟糕模型,以确保即使在严重模型退化的情况下,观察到的治疗效果仍然有效。我们使用来自脊柱X射线图像的两项随机对照试验评估了这些框架的性能。我们的研究结果表明,将人工智能作为支持阅读器(AI-SR)是临床试验中最合适的方法,因为它满足各种模型类型的所有标准,即使在存在糟糕模型的情况下。这种方法始终提供可靠的疾病估计,保留临床试验治疗效果估计和结论,并在应用于不同人群时保留这些优势。
更新时间: 2025-10-08 01:40:41
领域: cs.LG,cs.AI,eess.IV
VAL-Bench: Measuring Value Alignment in Language Models
Large language models (LLMs) are increasingly used for tasks where outputs shape human decisions, so it is critical to test whether their responses reflect consistent human values. Existing benchmarks mostly track refusals or predefined safety violations, but these only check rule compliance and do not reveal whether a model upholds a coherent value system when facing controversial real-world issues. We introduce the Value ALignment Benchmark (VAL-Bench), which evaluates whether models maintain a stable value stance across paired prompts that frame opposing sides of public debates. VAL-Bench consists of 115K such pairs from Wikipedia's controversial sections. A well-aligned model should express similar underlying views regardless of framing, which we measure using an LLM-as-judge to score agreement or divergence between paired responses. Applied across leading open- and closed-source models, the benchmark reveals large variation in alignment and highlights trade-offs between safety strategies (e.g., refusals) and more expressive value systems. By providing a scalable, reproducible benchmark, VAL-Bench enables systematic comparison of how reliably LLMs embody human values.
Updated: 2025-10-08 01:35:03
标题: VAL-Bench:衡量语言模型中的价值对齐
摘要: 大型语言模型(LLMs)在越来越多的任务中被使用,这些任务涉及到塑造人类决策,因此测试它们的响应是否反映了一致的人类价值是至关重要的。现有的基准主要跟踪拒绝或预定义的安全违规行为,但这些只检查规则的遵守,不揭示模型在面对有争议的现实问题时是否坚持一致的价值体系。我们引入了价值对齐基准(VAL-Bench),评估模型是否在表达公开辩论的两个对立面之间保持稳定的价值立场。VAL-Bench包含来自维基百科有争议部分的115K个这样的对。一个良好对齐的模型应该在不同框架下表达相似的基本观点,我们使用LLM作为评委来评分配对响应之间的一致性或分歧。应用于领先的开源和闭源模型,该基准揭示了对齐度的巨大变化,并凸显了安全策略(如拒绝)和更具表现力的价值体系之间的权衡。通过提供可扩展、可复制的基准,VAL-Bench使得可以系统比较LLMs如何可靠地体现人类价值。
更新时间: 2025-10-08 01:35:03
领域: cs.AI,cs.CL
Auto-Stega: An Agent-Driven System for Lifelong Strategy Evolution in LLM-Based Text Steganography
With the rapid progress of LLMs, high quality generative text has become widely available as a cover for text steganography. However, prevailing methods rely on hand-crafted or pre-specified strategies and struggle to balance efficiency, imperceptibility, and security, particularly at high embedding rates. Accordingly, we propose Auto-Stega, an agent-driven self-evolving framework that is the first to realize self-evolving steganographic strategies by automatically discovering, composing, and adapting strategies at inference time; the framework operates as a closed loop of generating, evaluating, summarizing, and updating that continually curates a structured strategy library and adapts across corpora, styles, and task constraints. A decoding LLM recovers the information under the shared strategy. To handle high embedding rates, we introduce PC-DNTE, a plug-and-play algorithm that maintains alignment with the base model's conditional distribution at high embedding rates, preserving imperceptibility while enhancing security. Experimental results demonstrate that at higher embedding rates Auto-Stega achieves superior performance with gains of 42.2\% in perplexity and 1.6\% in anti-steganalysis performance over SOTA methods.
Updated: 2025-10-08 01:32:59
标题: 自动隐写:基于LLM的文本隐写术中的终身策略演化的代理驱动系统
摘要: 随着LLMs的快速发展,高质量的生成文本已经广泛应用于文本隐写术。然而,目前的方法依赖于手工制作或预先指定的策略,并且在高嵌入率下很难平衡效率、不可察觉性和安全性。因此,我们提出了Auto-Stega,这是第一个实现自我演化隐写术策略的框架,通过在推断时自动发现、组合和调整策略;该框架作为一个闭环运行,持续策划一个结构化策略库,并在语料库、风格和任务约束之间进行调整。解码LLM在共享策略下恢复信息。为了处理高嵌入率,我们引入了PC-DNTE,这是一种即插即用的算法,在高嵌入率下保持与基础模型的条件分布的一致性,同时保持不可察觉性,并增强安全性。实验结果表明,在更高的嵌入率下,Auto-Stega在困惑度上取得了42.2%的提升,并且比SOTA方法在反隐写术性能上提升了1.6%。
更新时间: 2025-10-08 01:32:59
领域: cs.CR
Adapting Quantum Machine Learning for Energy Dissociation of Bonds
Accurate prediction of bond dissociation energies (BDEs) underpins mechanistic insight and the rational design of molecules and materials. We present a systematic, reproducible benchmark comparing quantum and classical machine learning models for BDE prediction using a chemically curated feature set encompassing atomic properties (atomic numbers, hybridization), bond characteristics (bond order, type), and local environmental descriptors. Our quantum framework, implemented in Qiskit Aer on six qubits, employs ZZFeatureMap encodings with variational ansatz (RealAmplitudes) across multiple architectures Variational Quantum Regressors (VQR), Quantum Support Vector Regressors (QSVR), Quantum Neural Networks (QNN), Quantum Convolutional Neural Networks (QCNN), and Quantum Random Forests (QRF). These are rigorously benchmarked against strong classical baselines, including Support Vector Regression (SVR), Random Forests (RF), and Multi-Layer Perceptrons (MLP). Comprehensive evaluation spanning absolute and relative error metrics, threshold accuracies, and error distributions shows that top-performing quantum models (QCNN, QRF) match the predictive accuracy and robustness of classical ensembles and deep networks, particularly within the chemically prevalent mid-range BDE regime. These findings establish a transparent baseline for quantum-enhanced molecular property prediction and outline a practical foundation for advancing quantum computational chemistry toward near chemical accuracy.
Updated: 2025-10-08 01:32:26
标题: 调整量子机器学习以适应键的能量解离
摘要: 准确预测键解离能(BDEs)是深入理解机理和合理设计分子和材料的基础。我们提出了一个系统的、可重复的基准,比较了用于BDE预测的量子和经典机器学习模型,使用一个经过化学精心策划的特征集,包括原子性质(原子序数、杂化化合物)、键特性(键序、类型)和局部环境描述符。我们的量子框架,实现在六个量子位上的Qiskit Aer中,采用ZZFeatureMap编码与变分量子态(RealAmplitudes)在多种架构的变分量子回归器(VQR)、量子支持向量回归器(QSVR)、量子神经网络(QNN)、量子卷积神经网络(QCNN)和量子随机森林(QRF)。这些模型经过严格的基准测试,与强大的经典基线进行了比较,包括支持向量回归(SVR)、随机森林(RF)和多层感知器(MLP)。全面评估跨越绝对和相对误差度量、阈值准确性和误差分布,显示表现最佳的量子模型(QCNN、QRF)与传统集成和深度网络的预测准确性和稳健性相匹配,特别是在化学中普遍存在的中等BDE范围内。这些发现为量子增强分子性质预测建立了透明的基准,并概述了推动量子计算化学朝着接近化学精度的实践基础。
更新时间: 2025-10-08 01:32:26
领域: quant-ph,cs.LG
Closed-form $\ell_r$ norm scaling with data for overparameterized linear regression and diagonal linear networks under $\ell_p$ bias
For overparameterized linear regression with isotropic Gaussian design and minimum-$\ell_p$ interpolator $p\in(1,2]$, we give a unified, high-probability characterization for the scaling of the family of parameter norms $ \\{ \lVert \widehat{w_p} \rVert_r \\}_{r \in [1,p]} $ with sample size. We solve this basic, but unresolved question through a simple dual-ray analysis, which reveals a competition between a signal *spike* and a *bulk* of null coordinates in $X^\top Y$, yielding closed-form predictions for (i) a data-dependent transition $n_\star$ (the "elbow"), and (ii) a universal threshold $r_\star=2(p-1)$ that separates $\lVert \widehat{w_p} \rVert_r$'s which plateau from those that continue to grow with an explicit exponent. This unified solution resolves the scaling of *all* $\ell_r$ norms within the family $r\in [1,p]$ under $\ell_p$-biased interpolation, and explains in one picture which norms saturate and which increase as $n$ grows. We then study diagonal linear networks (DLNs) trained by gradient descent. By calibrating the initialization scale $\alpha$ to an effective $p_{\mathrm{eff}}(\alpha)$ via the DLN separable potential, we show empirically that DLNs inherit the same elbow/threshold laws, providing a predictive bridge between explicit and implicit bias. Given that many generalization proxies depend on $\lVert \widehat {w_p} \rVert_r$, our results suggest that their predictive power will depend sensitively on which $l_r$ norm is used.
Updated: 2025-10-08 01:23:07
标题: 具有数据的封闭形式$\ell_r$范数缩放的过参数化线性回归和对角线性网络在$\ell_p$偏差下的翻译
摘要: 对于具有各向同性高斯设计和最小-$\ell_p$插值器$p\in(1,2]$的超参数化线性回归,我们给出了一个统一的、高概率的特征化结果,描述了参数范数族$ \\{ \lVert \widehat{w_p} \rVert_r \\}_{r \in [1,p]} $随样本量的缩放。 通过简单的对偶射线分析,我们解决了这个基本但未解决的问题,揭示了$X^\top Y$中信号*尖峰*和*大量*空坐标之间的竞争,从而得出了关于(i)数据相关转换$n_\star$("拐点")和(ii)一个分隔$\lVert \widehat{w_p} \rVert_r$的通用阈值$r_\star=2(p-1)$的封闭形式预测,这个阈值将那些趋于平稳的$\ell_r$范数与那些继续以明确指数增长的范数分开。 这个统一解决方案解决了在$\ell_p$偏置插值下*所有*族$r\in [1,p]$内的$\ell_r$范数的缩放问题,并在一个图像中解释了哪些范数会饱和,哪些会随着$n$增长而增加。 然后,我们研究了由梯度下降训练的对角线性网络(DLNs)。通过通过DLN可分势函数校准初始化尺度$\alpha$到一个有效的$p_{\mathrm{eff}}(\alpha)$,我们通过实验证明DLNs继承了相同的拐点/阈值定律,提供了明确和隐含偏差之间的预测桥梁。 鉴于许多泛化代理取决于$\lVert \widehat {w_p} \rVert_r$,我们的结果表明它们的预测能力将敏感地取决于使用哪种$l_r$范数。
更新时间: 2025-10-08 01:23:07
领域: cs.LG,math.ST,stat.ML,stat.TH
Barbarians at the Gate: How AI is Upending Systems Research
Artificial Intelligence (AI) is starting to transform the research process as we know it by automating the discovery of new solutions. Given a task, the typical AI-driven approach is (i) to generate a set of diverse solutions, and then (ii) to verify these solutions and select one that solves the problem. Crucially, this approach assumes the existence of a reliable verifier, i.e., one that can accurately determine whether a solution solves the given problem. We argue that systems research, long focused on designing and evaluating new performance-oriented algorithms, is particularly well-suited for AI-driven solution discovery. This is because system performance problems naturally admit reliable verifiers: solutions are typically implemented in real systems or simulators, and verification reduces to running these software artifacts against predefined workloads and measuring performance. We term this approach as AI-Driven Research for Systems (ADRS), which iteratively generates, evaluates, and refines solutions. Using penEvolve, an existing open-source ADRS instance, we present case studies across diverse domains, including load balancing for multi-region cloud scheduling, Mixture-of-Experts inference, LLM-based SQL queries, and transaction scheduling. In multiple instances, ADRS discovers algorithms that outperform state-of-the-art human designs (e.g., achieving up to 5.0x runtime improvements or 50% cost reductions). We distill best practices for guiding algorithm evolution, from prompt design to evaluator construction, for existing frameworks. We then discuss the broader implications for the systems community: as AI assumes a central role in algorithm design, we argue that human researchers will increasingly focus on problem formulation and strategic guidance. Our results highlight both the disruptive potential and the urgent need to adapt systems research practices in the age of AI.
Updated: 2025-10-08 01:21:49
标题: 大门口的野蛮人:人工智能如何颠覆系统研究
摘要: 人工智能(AI)开始改变我们所知的研究过程,通过自动发现新解决方案。在给定一个任务的情况下,典型的AI驱动方法是(i)生成一组多样化的解决方案,然后(ii)验证这些解决方案并选择一个解决问题的方案。关键是,这种方法假定存在一个可靠的验证器,即能够准确确定一个解决方案是否解决了给定问题的验证器。我们认为,长期致力于设计和评估新性能导向算法的系统研究特别适合AI驱动的解决方案发现。这是因为系统性能问题自然地包含可靠的验证器:解决方案通常在实际系统或模拟器中实现,验证简化为运行这些软件构件以针对预定义的工作负载并测量性能。我们将这种方法称为AI驱动的系统研究(ADRS),它迭代地生成、评估和改进解决方案。使用现有的开源ADRS实例penEvolve,我们展示了跨多个领域的案例研究,包括多区域云调度的负载平衡、专家混合推理、基于LLM的SQL查询和事务调度。在多个实例中,ADRS发现了优于最新人类设计的算法(例如,实现了高达5.0倍的运行时改进或50%的成本降低)。我们总结了引导算法演化的最佳实践,从设计提示到评估器构建,为现有框架提供指导。然后,我们讨论了对系统社区的更广泛影响:随着AI在算法设计中扮演中心角色,我们认为人类研究人员将越来越专注于问题形式化和战略指导。我们的研究结果突显了AI时代系统研究实践适应的潜在颠覆性和迫切需要。
更新时间: 2025-10-08 01:21:49
领域: cs.AI
The Markovian Thinker
Reinforcement learning (RL) has recently become a strong recipe for training reasoning LLMs that produce long chains of thought (LongCoT). Yet the standard RL "thinking environment", where the state is the prompt plus all prior reasoning tokens, makes the state unbounded and forces attention-based policies to pay quadratic compute as thoughts lengthen. We revisit the environment itself. We propose Markovian Thinking, a paradigm in which the policy advances reasoning while conditioning on a constant-size state, decoupling thinking length from context size. As an immediate consequence this yields linear compute with constant memory. We instantiate this idea with Delethink, an RL environment that structures reasoning into fixed-size chunks. Within each chunk, the model thinks as usual; at the boundary, the environment resets the context and reinitializes the prompt with a short carryover. Through RL, the policy learns to write a textual state near the end of each chunk sufficient for seamless continuation of reasoning after reset. Trained in this environment, an R1-Distill 1.5B model reasons in 8K-token chunks yet thinks up to 24K tokens, matching or surpassing LongCoT-RL trained with a 24K budget. With test-time scaling, Delethink continues to improve where LongCoT plateaus. The effect of linear compute is substantial: we empirically estimate at 96K average thinking length LongCoT-RL costs 27 H100-months vs. 7 for Delethink. Analysis at RL initialization shows off-the-shelf reasoning models (1.5B-120B) often sample Markovian traces zero-shot across diverse benchmarks, providing positive samples that make RL effective at scale. Our results show that redesigning the thinking environment is a powerful lever: it enables very long reasoning without quadratic overhead and opens a path toward efficient, scalable reasoning LLMs.
Updated: 2025-10-08 01:18:13
标题: 马尔可夫思想家
摘要: 强化学习(RL)最近已成为训练产生长串思维(LongCoT)的推理LLM的强有力方法。然而,标准RL“思考环境”中,状态是提示加上所有先前的推理标记,使得状态无界,并迫使基于注意力的策略在思考变长时支付二次计算。我们重新审视了环境本身。我们提出马尔可夫思维,这是一种范式,其中策略在条件于一个恒定大小的状态的基础上推进推理,将思考长度与上下文大小分离。作为直接结果,这产生了具有恒定内存的线性计算。我们用Delethink实现了这个想法,这是一个将推理结构化为固定大小块的RL环境。在每个块内,模型像往常一样思考;在边界处,环境重新设置上下文,并用一个简短的延续重新初始化提示。通过RL,策略学会在每个块结束附近编写一个文本状态,足以在重置后无缝继续推理。在这个环境中训练,一个R1-Distill 1.5B模型在8K标记块中推理,但思考达到24K标记,与使用24K预算训练的LongCoT-RL相匹配或超越。通过测试时间缩放,Delethink在LongCoT停滞时继续改进。线性计算的效果是显著的:我们在RL初始化时经验估计,平均思考长度为96K的LongCoT-RL成本为27个H100月,而Delethink为7。在RL初始化时的分析显示,现成的推理模型(1.5B-120B)经常在各种基准测试中零样本采样马尔可夫轨迹,提供使RL在规模上有效的正样本。我们的结果表明,重新设计思考环境是一个强大的杠杆:它使得非常长的推理成为可能,避免了二次开销,并为高效、可扩展的推理LLM打开了一条道路。
更新时间: 2025-10-08 01:18:13
领域: cs.LG,cs.AI,cs.CL
From Acceleration to Saturation: Scaling Behavior of Bootstrapped Language Model Pretraining
Bootstrapped pretraining, i.e., the reuse of a pretrained base model for further pretraining, such as continual pretraining or model growth, is promising at reducing the cost of training language models from scratch. However, its effectiveness remains unclear, especially when applied to overtrained base models. In this work, we empirically study the scaling behavior of bootstrapped pretraining and find that its scaling efficiency diminishes in a predictable manner: The scaling exponent with respect to second-stage pretraining tokens decreases logarithmically with the number of tokens used to pretrain the base model. The joint dependence on first- and second-stage tokens is accurately modeled by a simple scaling law. Such saturation effect reveals a fundamental trade-off in multi-stage pretraining strategies: the more extensively a model is pretrained, the less additional benefit bootstrapping provides. Our findings provide practical insights for efficient language model training and raise important considerations for the reuse of overtrained models.
Updated: 2025-10-08 00:59:33
标题: 从加速到饱和:引导式语言模型预训练的尺度行为
摘要: 引导式预训练,即再次利用预训练的基础模型进行进一步的预训练,比如持续预训练或模型增长,有望降低从头开始训练语言模型的成本。然而,其有效性仍不明确,特别是当应用于过度训练的基础模型时。在这项工作中,我们通过实证研究了引导式预训练的规模行为,并发现其规模效率以可预测的方式减少:与第二阶段预训练标记相关的缩放指数随用于预训练基础模型的标记数量呈对数减少。对第一阶段和第二阶段标记的联合依赖关系可以通过一个简单的缩放定律准确建模。这种饱和效应揭示了多阶段预训练策略中的一个基本权衡:模型预训练得越充分,引导提供的额外好处就越少。我们的发现为有效的语言模型训练提供了实用见解,并提出了重用过度训练模型的重要考虑因素。
更新时间: 2025-10-08 00:59:33
领域: cs.CL,cs.LG
Incoherence in goal-conditioned autoregressive models
We investigate mathematically the notion of incoherence: a structural issue with reinforcement learning policies derived by naive goal-conditioning of autoregressive models. We focus on the process of re-training models on their own actions, that is, fine-tuning offline-learned policies with online RL. We prove that it decreases incoherence and leads to an improvement in return, and we aim to characterize the resulting trajectory of policies. By re-framing standard notions of control-as-inference and soft Q learning, we establish a three-way correspondence with two other ways of understanding the iterative re-training process: as folding the posterior into the reward and, in the deterministic case, as decreasing the temperature parameter; the correspondence has computational content via the training-inference trade-off. Through soft-conditioning generative models, we discuss the link between incoherence and the effective horizon.
Updated: 2025-10-08 00:52:13
标题: 目标条件自回归模型中的不一致性
摘要: 我们在数学上研究了不一致性的概念:这是由于自动回归模型的天真目标条件派生的强化学习策略的一个结构问题。我们重点关注在其自身行为上重新训练模型的过程,即用在线强化学习对离线学习的策略进行微调。我们证明这种方法可以降低不一致性,并导致回报的改善,我们旨在描述由此产生的策略轨迹。通过重新构建控制即推理和软Q学习的标准概念,我们建立了与另外两种理解迭代重新训练过程的方式的三方对应关系:将后验折叠到奖励中,以及在确定性情况下,降低温度参数;这种对应关系通过训练推理权衡具有计算内容。通过软条件生成模型,我们讨论了不一致性和有效视野之间的联系。
更新时间: 2025-10-08 00:52:13
领域: cs.LG,cs.AI
Benchmarking Fake Voice Detection in the Fake Voice Generation Arms Race
As advances in synthetic voice generation accelerate, an increasing variety of fake voice generators have emerged, producing audio that is often indistinguishable from real human speech. This evolution poses new and serious threats across sectors where audio recordings serve as critical evidence. Although fake voice detectors are also advancing, the arms race between fake voice generation and detection has become more intense and complex. In this work, we present the first large-scale, cross-domain evaluation of fake voice detectors, benchmarking 8 state-of-the-art models against datasets synthesized by 20 different fake voice generation systems. To the best of our knowledge, this is the most comprehensive cross-domain assessment conducted to date. Our study reveals substantial security vulnerabilities in current fake voice detection systems, underscoring critical gaps in their real-world robustness. To advance the field, we propose a unified and effective metric that consolidates the diverse and often inconsistent evaluation criteria previously used across different studies. This metric enables standardized, straightforward comparisons of the robustness of fake voice detectors. We conclude by offering actionable recommendations for building more resilient fake voice detection technologies, with the broader goal of reinforcing the foundations of AI security and trustworthiness.
Updated: 2025-10-08 00:52:06
标题: 在假声音生成的竞争中进行假声音检测的基准测试
摘要: 随着合成语音生成技术的进步加速,越来越多的虚假语音生成器出现,产生的音频往往与真实人类语音无法区分。这种演变在各个领域引发了新的严重威胁,其中音频记录作为关键证据。虽然虚假语音检测器也在不断进步,但虚假语音生成和检测之间的“军备竞赛”变得更加激烈和复杂。在这项工作中,我们首次进行了大规模、跨领域的虚假语音检测器评估,将8种最先进的模型与由20种不同的虚假语音生成系统合成的数据集进行了基准测试。据我们所知,这是迄今为止进行的最全面的跨领域评估。我们的研究揭示了当前虚假语音检测系统中存在的严重安全漏洞,突显了它们在实际世界中的鲁棒性方面的重要差距。为推动该领域的发展,我们提出了一个统一而有效的度量标准,整合了之前在不同研究中使用的多样化且常常不一致的评估标准。该度量标准使得对虚假语音检测器的鲁棒性进行标准化、简单的比较成为可能。最后,我们提出了建立更具弹性的虚假语音检测技术的可操作建议,旨在加强人工智能安全和可信度的基础。
更新时间: 2025-10-08 00:52:06
领域: cs.SD,cs.CR,eess.AS
Cluster Paths: Navigating Interpretability in Neural Networks
While modern deep neural networks achieve impressive performance in vision tasks, they remain opaque in their decision processes, risking unwarranted trust, undetected biases and unexpected failures. We propose cluster paths, a post-hoc interpretability method that clusters activations at selected layers and represents each input as its sequence of cluster IDs. To assess these cluster paths, we introduce four metrics: path complexity (cognitive load), weighted-path purity (class alignment), decision-alignment faithfulness (predictive fidelity), and path agreement (stability under perturbations). In a spurious-cue CIFAR-10 experiment, cluster paths identify color-based shortcuts and collapse when the cue is removed. On a five-class CelebA hair-color task, they achieve 90% faithfulness and maintain 96% agreement under Gaussian noise without sacrificing accuracy. Scaling to a Vision Transformer pretrained on ImageNet, we extend cluster paths to concept paths derived from prompting a large language model on minimal path divergences. Finally, we show that cluster paths can serve as an effective out-of-distribution (OOD) detector, reliably flagging anomalous samples before the model generates over-confident predictions. Cluster paths uncover visual concepts, such as color palettes, textures, or object contexts, at multiple network depths, demonstrating that cluster paths scale to large vision models while generating concise and human-readable explanations.
Updated: 2025-10-08 00:41:09
标题: 集群路径:在神经网络中导航可解释性
摘要: 尽管现代深度神经网络在视觉任务中取得了令人印象深刻的性能,但它们在决策过程中仍然不透明,存在风险,可能导致不合理的信任、未被发现的偏见和意外的失败。我们提出了一种后置解释性方法——集群路径,该方法在选定的层中对激活进行聚类,并将每个输入表示为其集群ID序列。为了评估这些集群路径,我们引入了四个指标:路径复杂度(认知负荷)、加权路径纯度(类别对齐)、决策对齐忠实度(预测准确度)和路径一致性(受扰动影响的稳定性)。在一个虚假线索的CIFAR-10实验中,集群路径识别了基于颜色的捷径,并在线索消失时崩溃。在一个五类CelebA头发颜色任务中,它们实现了90%的忠实度,并在高斯噪声下保持了96%的一致性,而不会牺牲准确性。在一个在ImageNet上预训练的Vision Transformer上进行扩展,我们将集群路径扩展到从提示一个大型语言模型上导出最小路径分歧的概念路径。最后,我们展示了集群路径可以作为一种有效的异常样本检测器,在模型生成过于自信的预测之前可靠地标记异常样本。集群路径揭示了视觉概念,如色彩调色板、纹理或对象背景,在多个网络深度上,证明了集群路径适用于大型视觉模型,同时生成简洁易读的解释。
更新时间: 2025-10-08 00:41:09
领域: cs.CV,cs.LG
Scalable Policy-Based RL Algorithms for POMDPs
The continuous nature of belief states in POMDPs presents significant computational challenges in learning the optimal policy. In this paper, we consider an approach that solves a Partially Observable Reinforcement Learning (PORL) problem by approximating the corresponding POMDP model into a finite-state Markov Decision Process (MDP) (called Superstate MDP). We first derive theoretical guarantees that improve upon prior work that relate the optimal value function of the transformed Superstate MDP to the optimal value function of the original POMDP. Next, we propose a policy-based learning approach with linear function approximation to learn the optimal policy for the Superstate MDP. Consequently, our approach shows that a POMDP can be approximately solved using TD-learning followed by Policy Optimization by treating it as an MDP, where the MDP state corresponds to a finite history. We show that the approximation error decreases exponentially with the length of this history. To the best of our knowledge, our finite-time bounds are the first to explicitly quantify the error introduced when applying standard TD learning to a setting where the true dynamics are not Markovian.
Updated: 2025-10-08 00:33:38
标题: 可扩展的基于策略的POMDP强化学习算法
摘要: 在部分可观察强化学习(PORL)问题中,POMDP中信念状态的连续性给学习最优策略带来了重要的计算挑战。在本文中,我们考虑一种方法,通过将相应的POMDP模型近似为有限状态的马尔可夫决策过程(MDP)(称为超级状态MDP)来解决PORL问题。我们首先推导出理论保证,改进了与转换后的超级状态MDP的最优值函数与原始POMDP的最优值函数相关的先前工作。接下来,我们提出了一种基于策略的学习方法,采用线性函数逼近来学习超级状态MDP的最优策略。因此,我们的方法表明可以通过将POMDP视为MDP,使用TD学习后跟随策略优化来近似解决POMDP问题,其中MDP状态对应于有限的历史。我们表明,逼近误差随着这个历史的长度呈指数级下降。据我们所知,我们的有限时间界限是第一个明确量化标准TD学习引入的误差的界限,当真实动态不是马尔可夫时。
更新时间: 2025-10-08 00:33:38
领域: cs.LG,cs.AI,stat.ML
Auto-Prompt Ensemble for LLM Judge
We present a novel framework that improves the reliability of LLM judges by selectively augmenting LLM with auxiliary evaluation dimensions. Existing LLM judges often miss crucial evaluation dimensions because they fail to recognize the implicit standards underlying human assessments. To address this challenge, we propose the Auto-Prompt Ensemble (APE), an adaptive framework that automatically learns evaluation dimensions from its failure cases. APE incorporates a confidence-based ensemble mechanism to decide when to adopt the judgments from additional evaluation dimensions through a novel confidence estimation approach called Collective Confidence. Extensive experiments demonstrate that APE improves the reliability of LLM Judge across diverse standard benchmarks. For instance, APE enhances GPT-4o agreement rate on Reward Bench from 87.2% to 90.5% in the zero-shot setting. Overall, APE provides a principled approach for LLM Judge to leverage test-time computation, and bridge the evaluation gap between human and LLM judges.
Updated: 2025-10-08 00:28:51
标题: 自动提示集合用于LLM法官
摘要: 我们提出了一个新颖的框架,通过选择性地增加LLM的辅助评估维度来提高LLM评判者的可靠性。现有的LLM评判者经常会错过关键的评估维度,因为他们未能识别出人类评估背后的隐含标准。为了解决这一挑战,我们提出了Auto-Prompt Ensemble (APE),这是一个自适应框架,可以从失败案例中自动学习评估维度。APE结合了基于信心的集成机制,通过一种称为Collective Confidence的新颖信心估计方法来决定何时采用来自额外评估维度的判断。大量实验证明,APE提高了LLM Judge在各种标准基准上的可靠性。例如,在零-shot设置下,APE将GPT-4o在Reward Bench上的一致率从87.2%提高到90.5%。总体而言,APE为LLM Judge提供了一种原则性的方法,可以利用测试时间计算,并弥合人类评判者和LLM评判者之间的评估差距。
更新时间: 2025-10-08 00:28:51
领域: cs.AI,cs.LG
SpyChain: Multi-Vector Supply Chain Attacks on Small Satellite Systems
Small satellites are integral to scientific, commercial, and defense missions, but reliance on commercial off-the-shelf (COTS) hardware broadens their attack surface. Although supply chain threats are well studied in other cyber-physical domains, their feasibility and stealth in space systems remain largely unexplored. Prior work has focused on flight software, which benefits from strict security practices and oversight. In contrast, auxiliary COTS components often lack robust assurance yet enjoy comparable access to critical on-board resources, including telemetry, system calls, and the software bus. Despite this privileged access, the insider threat within COTS hardware supply chains has received little attention. In this work, we present SpyChain, the first end-to-end design and implementation of independent and colluding hardware supply chain threats targeting small satellites. Using NASA's satellite simulation (NOS3), we demonstrate that SpyChain can evade testing, exfiltrate telemetry, disrupt operations, and launch Denial of Service (DoS) attacks through covert channels that bypass ground monitoring. Our study traces an escalation from a simple solo component to dynamic, coordinating malware, introducing a taxonomy of stealth across five scenarios. We showcase how implicit trust in auxiliary components enables covert persistence and reveal novel attack vectors, highlighting a new multi-component execution technique that is now incorporated into the SPARTA matrix. Our findings are reinforced by acknowledgment and affirmation from NASA's NOS3 team. Finally, we implement lightweight onboard defenses, including runtime monitoring, to mitigate threats like SpyChain.
Updated: 2025-10-08 00:21:40
标题: SpyChain:对小卫星系统的多向供应链攻击
摘要: 小卫星在科学、商业和国防任务中起着重要作用,但依赖商用现成硬件扩大了它们的攻击面。尽管供应链威胁在其他网络物理领域得到了广泛研究,但它们在太空系统中的可行性和隐蔽性仍然是未被充分探索的。以往的研究主要集中在飞行软件上,这些软件受益于严格的安全实践和监督。相反,辅助的商用现成元件通常缺乏强大的保证,但与重要的机载资源(包括遥测、系统调用和软件总线)具有可比的访问权限。尽管拥有这种特权访问权限,商用现成硬件供应链中内部威胁却受到了很少的关注。在这项工作中,我们提出了SpyChain,这是第一个独立和勾结的硬件供应链威胁针对小卫星的端到端设计和实现。利用NASA的卫星模拟(NOS3),我们展示了SpyChain可以逃避测试,窃取遥测数据,干扰操作,并通过绕过地面监控的隐蔽通道发动拒绝服务(DoS)攻击。我们的研究追踪了从简单的独立组件到动态、协调的恶意软件的升级,引入了五种情景的隐蔽分类。我们展示了如何对辅助元件的隐含信任使得隐蔽持久性成为可能,并揭示了新的攻击向量,突出了一种新的多组件执行技术,现已纳入SPARTA矩阵。我们的发现得到了NASA的NOS3团队的认可和肯定。最后,我们实施了轻量级的机载防御措施,包括运行时监控,以减轻像SpyChain这样的威胁。
更新时间: 2025-10-08 00:21:40
领域: cs.CR,C.3; D.4.6; K.6.5
Beneficial Reasoning Behaviors in Agentic Search and Effective Post-training to Obtain Them
Agentic search leverages large language models (LLMs) to interpret complex user information needs and execute a multi-step process of planning, searching, and synthesizing information to provide answers. This paradigm introduces unique challenges for LLMs' reasoning and agentic capabilities when interacting with retrieval systems and the broader web. In this paper, we propose a reasoning-driven LLM-based pipeline to study effective reasoning behavior patterns in agentic search. Using this pipeline, we analyze successful agentic search trajectories and identify four beneficial reasoning behaviors: Information Verification, Authority Evaluation, Adaptive Search, and Error Recovery. Based on these findings, we propose a technique called Behavior Priming to train more effective agentic search models. It synthesizes agentic search trajectories that exhibit these four behaviors and integrates them into the agentic search model through supervised fine-tuning (SFT), followed by standard reinforcement learning (RL). Experiments on three benchmarks (GAIA, WebWalker, and HLE) demonstrate that behavior priming yields over 35% gains in Llama3.2-3B and Qwen3-1.7B compared to directly training agentic search models with RL. Crucially, we demonstrate that the desired reasoning behaviors in the SFT data, rather than the correctness of the final answer, is the critical factor for achieving strong final performance after RL: fine-tuning on trajectories with desirable reasoning behaviors but incorrect answers leads to better performance than fine-tuning on trajectories with correct answers. Our analysis further reveals the underlying mechanism: the introduced reasoning behaviors endow models with more effective exploration (higher pass@k and entropy) and test-time scaling (longer trajectories) capabilities, providing a strong foundation for RL. Our code will be released as open source.
Updated: 2025-10-08 00:20:35
标题: 在代理搜索中有益的推理行为及有效的后训练获取方式
摘要: 主动搜索利用大型语言模型(LLMs)来解释复杂的用户信息需求,并执行规划、搜索和合成信息的多步骤过程,以提供答案。这种范式在与检索系统和更广泛的网络交互时,为LLMs的推理和主动能力带来了独特的挑战。在本文中,我们提出了一种基于推理的LLM管道,用于研究主动搜索中有效的推理行为模式。通过这个管道,我们分析成功的主动搜索轨迹,并确定了四种有益的推理行为:信息验证、权威评估、自适应搜索和错误恢复。基于这些发现,我们提出了一种称为行为启动的技术,用于训练更有效的主动搜索模型。它合成展现这四种行为的主动搜索轨迹,并通过监督微调(SFT)将它们整合到主动搜索模型中,然后进行标准强化学习(RL)。在三个基准测试(GAIA、WebWalker和HLE)上的实验证明,与直接使用RL训练主动搜索模型相比,行为启动使Llama3.2-3B和Qwen3-1.7B获得了超过35%的增益。关键是,我们证明在RL之后实现强大的最终表现的关键因素是SFT数据中所需的推理行为,而不是最终答案的正确性:在展示出理想的推理行为但错误答案的轨迹上微调,比在展示出正确答案的轨迹上微调,效果更好。我们的分析进一步揭示了潜在的机制:引入的推理行为赋予模型更有效的探索(更高的pass@k和熵)和测试时间扩展(更长的轨迹)能力,为RL提供了强有力的基础。我们的代码将作为开源发布。
更新时间: 2025-10-08 00:20:35
领域: cs.AI,cs.LG
From Description to Detection: LLM based Extendable O-RAN Compliant Blind DoS Detection in 5G and Beyond
The quality and experience of mobile communication have significantly improved with the introduction of 5G, and these improvements are expected to continue beyond the 5G era. However, vulnerabilities in control-plane protocols, such as Radio Resource Control (RRC) and Non-Access Stratum (NAS), pose significant security threats, such as Blind Denial of Service (DoS) attacks. Despite the availability of existing anomaly detection methods that leverage rule-based systems or traditional machine learning methods, these methods have several limitations, including the need for extensive training data, predefined rules, and limited explainability. Addressing these challenges, we propose a novel anomaly detection framework that leverages the capabilities of Large Language Models (LLMs) in zero-shot mode with unordered data and short natural language attack descriptions within the Open Radio Access Network (O-RAN) architecture. We analyse robustness to prompt variation, demonstrate the practicality of automating the attack descriptions and show that detection quality relies on the semantic completeness of the description rather than its phrasing or length. We utilise an RRC/NAS dataset to evaluate the solution and provide an extensive comparison of open-source and proprietary LLM implementations to demonstrate superior performance in attack detection. We further validate the practicality of our framework within O-RAN's real-time constraints, illustrating its potential for detecting other Layer-3 attacks.
Updated: 2025-10-08 00:13:02
标题: 从描述到检测:基于LLM的可扩展O-RAN兼容的5G及更高版本的盲目DoS检测.
摘要: 随着5G的引入,移动通信的质量和体验显著提高,这些改进预计将持续超越5G时代。然而,控制平面协议(如无线资源控制(RRC)和非接入层(NAS))中的漏洞构成了重大安全威胁,例如盲目拒绝服务(DoS)攻击。尽管现有的异常检测方法利用基于规则的系统或传统机器学习方法,但这些方法存在一些限制,包括对大量训练数据、预定义规则和有限的可解释性的需求。为了解决这些挑战,我们提出了一个新颖的异常检测框架,利用零-shot模式下大型语言模型(LLMs)的能力,以无序数据和短自然语言攻击描述为基础,结合开放无线接入网络(O-RAN)架构。我们分析了对提示变化的鲁棒性,展示了自动化攻击描述的实用性,并表明检测质量取决于描述的语义完整性而不是措辞或长度。我们利用一个RRC/NAS数据集来评估解决方案,并对开源和专有LLM实现进行了广泛比较,以展示在攻击检测方面的卓越性能。我们进一步验证了我们的框架在O-RAN实时约束条件下的实用性,展示了它在检测其他第3层攻击方面的潜力。
更新时间: 2025-10-08 00:13:02
领域: cs.CR,cs.ET,cs.LG,cs.NI
Platonic Transformers: A Solid Choice For Equivariance
While widespread, Transformers lack inductive biases for geometric symmetries common in science and computer vision. Existing equivariant methods often sacrifice the efficiency and flexibility that make Transformers so effective through complex, computationally intensive designs. We introduce the Platonic Transformer to resolve this trade-off. By defining attention relative to reference frames from the Platonic solid symmetry groups, our method induces a principled weight-sharing scheme. This enables combined equivariance to continuous translations and Platonic symmetries, while preserving the exact architecture and computational cost of a standard Transformer. Furthermore, we show that this attention is formally equivalent to a dynamic group convolution, which reveals that the model learns adaptive geometric filters and enables a highly scalable, linear-time convolutional variant. Across diverse benchmarks in computer vision (CIFAR-10), 3D point clouds (ScanObjectNN), and molecular property prediction (QM9, OMol25), the Platonic Transformer achieves competitive performance by leveraging these geometric constraints at no additional cost.
Updated: 2025-10-08 00:09:15
标题: 柏拉图变换器:一种适用于等变性的稳固选择
摘要: 尽管广泛存在,Transformer在科学和计算机视觉中常见的几何对称性方面缺乏归纳偏差。现有的等变方法通常牺牲了使Transformer如此有效的效率和灵活性,通过复杂的、计算密集的设计。我们引入了Platonic Transformer来解决这种权衡。通过将注意力定义为相对于Platonic固体对称群的参考框架,我们的方法引入了一种原则性的权重共享方案。这使得我们能够同时实现对连续平移和Platonic对称性的等变性,同时保持标准Transformer的精确架构和计算成本。此外,我们展示了这种注意力在形式上等同于动态群卷积,这揭示了该模型学习自适应几何滤波器并实现了一种高度可扩展的、线性时间的卷积变体。在计算机视觉(CIFAR-10)、3D点云(ScanObjectNN)和分子属性预测(QM9、OMol25)等各种基准测试中,Platonic Transformer利用这些几何约束以零额外成本取得了竞争性表现。
更新时间: 2025-10-08 00:09:15
领域: cs.CV,cs.AI,cs.LG,eess.IV
Rethinking Inter-LoRA Orthogonality in Adapter Merging: Insights from Orthogonal Monte Carlo Dropout
We propose Orthogonal Monte Carlo Dropout, a mechanism that enforces strict orthogonality when combining sparse semantic vectors without extra time complexity. Low-Rank Adaptation (LoRA), a popular fine-tuning method for large models, typically trains a module to represent a specific concept such as an object or a style. When multiple LoRA modules are merged, for example to generate an object in a particular style, their outputs (semantic vectors) may interfere with each other. Our method guarantees that merged LoRA modules remain orthogonal and thus free from direct interference. However, empirical analysis reveals that such orthogonality does not lead to the semantic disentanglement highlighted in prior work on compositional adaptation. This finding suggests that inter-LoRA orthogonality alone may be insufficient for achieving true semantic compositionality, prompting a re-examination of its role in adapter merging.
Updated: 2025-10-08 00:05:16
标题: 重新思考适配器合并中的Inter-LoRA正交性:来自正交蒙特卡洛辍学的见解
摘要: 我们提出了正交蒙特卡罗Dropout,这是一种在合并稀疏语义向量时强制执行严格正交性的机制,而不会增加额外的时间复杂度。低秩适应(LoRA)是一种用于大型模型的流行微调方法,通常训练一个模块来代表特定概念,如对象或风格。当多个LoRA模块合并时,例如为了以特定风格生成一个对象,它们的输出(语义向量)可能会相互干扰。我们的方法确保合并的LoRA模块保持正交,因此不会直接干扰。然而,经验分析表明,这种正交性并不能导致在组合适应性先前研究中强调的语义解缠。这一发现表明,单纯的模块间正交性可能不足以实现真正的语义组合性,促使重新审视其在适配器合并中的作用。
更新时间: 2025-10-08 00:05:16
领域: cs.LG,cs.AI,cs.CV
BACHI: Boundary-Aware Symbolic Chord Recognition Through Masked Iterative Decoding on Pop and Classical Music
Automatic chord recognition (ACR) via deep learning models has gradually achieved promising recognition accuracy, yet two key challenges remain. First, prior work has primarily focused on audio-domain ACR, while symbolic music (e.g., score) ACR has received limited attention due to data scarcity. Second, existing methods still overlook strategies that are aligned with human music analytical practices. To address these challenges, we make two contributions: (1) we introduce POP909-CL, an enhanced version of POP909 dataset with tempo-aligned content and human-corrected labels of chords, beats, keys, and time signatures; and (2) We propose BACHI, a symbolic chord recognition model that decomposes the task into different decision steps, namely boundary detection and iterative ranking of chord root, quality, and bass (inversion). This mechanism mirrors the human ear-training practices. Experiments demonstrate that BACHI achieves state-of-the-art chord recognition performance on both classical and pop music benchmarks, with ablation studies validating the effectiveness of each module.
Updated: 2025-10-08 00:02:56
标题: BACHI:基于遮蔽迭代解码的边界感知符号和弦识别技术在流行音乐和古典音乐中的应用
摘要: 通过深度学习模型实现的自动和弦识别(ACR)逐渐取得了令人满意的识别准确度,但仍然存在两个关键挑战。首先,先前的研究主要集中在音频领域的ACR,而符号音乐(例如谱)的ACR由于数据稀缺而受到限制的关注。其次,现有方法仍然忽视与人类音乐分析实践一致的策略。为了解决这些挑战,我们做出了两个贡献:(1)我们引入了POP909-CL,这是一个增强版的POP909数据集,其中包含与节奏对齐的内容和人工校正的和弦、节拍、调性和拍号标签;(2)我们提出了BACHI,这是一个符号和弦识别模型,将任务分解为不同的决策步骤,即边界检测和和弦根音、品质和低音(倒置)的迭代排名。这种机制反映了人类的耳朵训练实践。实验证明,BACHI在古典音乐和流行音乐基准上实现了最先进的和弦识别性能,消融研究验证了每个模块的有效性。
更新时间: 2025-10-08 00:02:56
领域: cs.SD,cs.LG,eess.AS
Wide Neural Networks as a Baseline for the Computational No-Coincidence Conjecture
We establish that randomly initialized neural networks, with large width and a natural choice of hyperparameters, have nearly independent outputs exactly when their activation function is nonlinear with zero mean under the Gaussian measure: $\mathbb{E}_{z \sim \mathcal{N}(0,1)}[\sigma(z)]=0$. For example, this includes ReLU and GeLU with an additive shift, as well as tanh, but not ReLU or GeLU by themselves. Because of their nearly independent outputs, we propose neural networks with zero-mean activation functions as a promising candidate for the Alignment Research Center's computational no-coincidence conjecture -- a conjecture that aims to measure the limits of AI interpretability.
Updated: 2025-10-08 00:02:22
标题: 宽神经网络作为计算无巧合猜想的基线
摘要: 我们确定,具有大宽度和自然选择的超参数的随机初始化神经网络,在其激活函数在高斯测度下具有零均值的非线性时,其输出几乎是独立的:$\mathbb{E}_{z \sim \mathcal{N}(0,1)}[\sigma(z)]=0$。例如,这包括具有加性偏移的ReLU和GeLU,以及tanh,但不包括单独的ReLU或GeLU。由于它们的输出几乎独立,我们提出具有零均值激活函数的神经网络作为对齐研究中心计算无重复猜想的一个有希望的候选者--一个旨在衡量人工智能可解释性极限的猜想。
更新时间: 2025-10-08 00:02:22
领域: cs.LG,stat.ML