zl程序教程

您现在的位置是:首页 >  IT要闻

当前栏目

机器学习学术速递[12.7]

2023-04-18 15:00:46 时间

cs.LG 方向,今日共计165篇

Graph相关(图学习|图神经网络|图优化等)(7篇)

【1】 Distance and Hop-wise Structures Encoding Enhanced Graph Attention Networks 标题:增强型图注意网络的距离和跳数结构编码 链接:https://arxiv.org/abs/2112.02868

作者:Zhiguo Huang,Xiaowei Chen,Bojuan Wang 机构:Sci-Tech Academy of ZheJiang University;Research Center of Hundsun LTD., Hangzhou, China, School of Finance, NanKai University, Tianjin, China 备注:11 pages; 1 figures; 摘要:大量工作已经证明,现有的邻域平均图神经网络不能有效地捕捉结构特征,许多工作表明注入结构、距离、位置或空间特征可以显著提高GNN的性能,将总体结构和距离注入GNNs是一个直观但未触及的想法。在这项工作中,我们阐明了方向。首先提取节点的逐跳结构信息,计算距离分布信息,结合节点的内在特征,将其嵌入到同一向量空间中,然后将其相加。将得到的嵌入向量送入GAT(如GAT、AGDN)中,然后进行校正和平滑,实验表明DHSEGATs达到了有竞争力的结果。该守则可于https://github.com/hzg0601/DHSEGATs. 摘要:Numerous works have proven that existing neighbor-averaging Graph Neural Networks cannot efficiently catch structure features, and many works show that injecting structure, distance, position or spatial features can significantly improve performance of GNNs, however, injecting overall structure and distance into GNNs is an intuitive but remaining untouched idea. In this work, we shed light on the direction. We first extracting hop-wise structure information and compute distance distributional information, gathering with node's intrinsic features, embedding them into same vector space and then adding them up. The derived embedding vectors are then fed into GATs(like GAT, AGDN) and then Correct and Smooth, experiments show that the DHSEGATs achieve competitive result. The code is available at https://github.com/hzg0601/DHSEGATs.

【2】 CDGNet: A Cross-Time Dynamic Graph-based Deep Learning Model for Traffic Forecasting 标题:CDGNet:一种基于跨时间动态图的交通预测深度学习模型 链接:https://arxiv.org/abs/2112.02736

作者:Yuchen Fang,Yanjun Qin,Haiyong Luo,Fang Zhao,Liang Zeng,Bo Hui,Chenxing Wang 机构:Beijing University of Posts and, Telecommunications, Institute of Computing Technology, Chinese Academy of Sciences, Tsinghua University, Auburn University 备注:10 pages 摘要:交通预测是网络智能交通系统的重要组成部分,有利于交通安全,但由于现实交通系统中存在复杂的、动态的时空依赖关系,交通预测具有很大的挑战性。以前的方法使用预定义的或可学习的静态图来提取空间相关性。然而,基于静态图的方法无法挖掘交通网络的演化。研究人员随后为每个时间片生成动态图,以反映空间相关性的变化,但他们遵循独立建模时空相关性的范式,忽略了跨时间-空间的影响。在本文中,我们提出了一种新的基于跨时间动态图的深度学习模型CDGNet,用于交通预测。该模型利用跨时间动态图,能够有效地捕捉各时间片与其历史时间片之间的跨时间-空间依赖关系。同时,我们设计了一种门机制来稀疏跨时间动态图,这符合现实世界中的稀疏空间相关性。此外,我们还提出了一种新的编码器-解码器结构,将基于跨时间动态图的GCN用于多步流量预测。在三个真实公共交通数据集上的实验结果表明,CDGNet优于最先进的基线。此外,我们还提供了定性研究,以分析我们的体系结构的有效性。 摘要:Traffic forecasting is important in intelligent transportation systems of webs and beneficial to traffic safety, yet is very challenging because of the complex and dynamic spatio-temporal dependencies in real-world traffic systems. Prior methods use the pre-defined or learnable static graph to extract spatial correlations. However, the static graph-based methods fail to mine the evolution of the traffic network. Researchers subsequently generate the dynamic graph for each time slice to reflect the changes of spatial correlations, but they follow the paradigm of independently modeling spatio-temporal dependencies, ignoring the cross-time spatial influence. In this paper, we propose a novel cross-time dynamic graph-based deep learning model, named CDGNet, for traffic forecasting. The model is able to effectively capture the cross-time spatial dependence between each time slice and its historical time slices by utilizing the cross-time dynamic graph. Meanwhile, we design a gating mechanism to sparse the cross-time dynamic graph, which conforms to the sparse spatial correlations in the real world. Besides, we propose a novel encoder-decoder architecture to incorporate the cross-time dynamic graph-based GCN for multi-step traffic forecasting. Experimental results on three real-world public traffic datasets demonstrate that CDGNet outperforms the state-of-the-art baselines. We additionally provide a qualitative study to analyze the effectiveness of our architecture.

【3】 Trivial bundle embeddings for learning graph representations 标题:用于学习图表示的平凡束嵌入 链接:https://arxiv.org/abs/2112.02531

作者:Zheng Xie,Xiaojing Zuo,Yiping Song 机构:Received: date Accepted: date 备注:17 pages,4 figures 摘要:嵌入真实世界的网络带来了挑战,因为不清楚如何识别它们的潜在几何结构。在欧几里德空间中嵌入一些非分支网络(如无标度网络)会产生失真。将无标度网络嵌入到双曲空间提供了一个令人兴奋的替代方案,但当嵌入具有潜在几何体而非双曲的分类网络时,会产生失真。我们提出了一个归纳模型,该模型利用GCN和平凡束的表达能力来学习具有或不具有节点特征的网络的归纳节点表示。平凡丛是纤维丛的一个简单例子,纤维丛是一个整体上是其基本空间和纤维的乘积空间的空间。基空间坐标和纤维坐标可以用来表示边生成中的分类因子和非分类因子。因此,该模型能够学习能够表达这些因素的嵌入。实际上,与欧几里德和双曲GCN相比,它减少了链路预测和节点分类的错误。 摘要:Embedding real-world networks presents challenges because it is not clear how to identify their latent geometries. Embedding some disassortative networks, such as scale-free networks, to the Euclidean space has been shown to incur distortions. Embedding scale-free networks to hyperbolic spaces offer an exciting alternative but incurs distortions when embedding assortative networks with latent geometries not hyperbolic. We propose an inductive model that leverages both the expressiveness of GCNs and trivial bundle to learn inductive node representations for networks with or without node features. A trivial bundle is a simple case of fiber bundles,a space that is globally a product space of its base space and fiber. The coordinates of base space and those of fiber can be used to express the assortative and disassortative factors in generating edges. Therefore, the model has the ability to learn embeddings that can express those factors. In practice, it reduces errors for link prediction and node classification when compared to the Euclidean and hyperbolic GCNs.

【4】 Augmentation-Free Self-Supervised Learning on Graphs 标题:图上的无增广自监督学习 链接:https://arxiv.org/abs/2112.02472

作者:Namkyeong Lee,Junseok Lee,Chanyoung Park 机构: Dept. of Industrial and Systems Engineering, KAIST, Daejeon, Republic of Korea, Graduate School of Artificial Intelligence, KAIST, Daejeon, Republic of Korea 摘要:受图像自监督方法最近取得的成功的启发,图形结构数据的自监督学习得到了快速发展,尤其是基于增强的对比方法。然而,我们认为,如果没有精心设计的增广技术,图上的增广可能会表现得任意,因为图的底层语义可能会发生剧烈的变化。因此,现有基于增强的方法的性能高度依赖于增强方案的选择,即与增强相关的超参数。在本文中,我们提出了一种新的无增广自监督图学习框架AFGRL。具体来说,我们通过发现与图共享局部结构信息和全局语义的节点来生成图的另一种视图。对各种节点级任务(即节点分类、聚类和各种真实数据集上的相似性搜索)的大量实验证明了AFGRL的优越性。AFGRL的源代码可在https://github.com/Namkyeong/AFGRL. 摘要:Inspired by the recent success of self-supervised methods applied on images, self-supervised learning on graph structured data has seen rapid growth especially centered on augmentation-based contrastive methods. However, we argue that without carefully designed augmentation techniques, augmentations on graphs may behave arbitrarily in that the underlying semantics of graphs can drastically change. As a consequence, the performance of existing augmentation-based methods is highly dependent on the choice of augmentation scheme, i.e., hyperparameters associated with augmentations. In this paper, we propose a novel augmentation-free self-supervised learning framework for graphs, named AFGRL. Specifically, we generate an alternative view of a graph by discovering nodes that share the local structural information and the global semantics with the graph. Extensive experiments towards various node-level tasks, i.e., node classification, clustering, and similarity search on various real-world datasets demonstrate the superiority of AFGRL. The source code for AFGRL is available at https://github.com/Namkyeong/AFGRL.

【5】 Fast Graph Neural Tangent Kernel via Kronecker Sketching 标题:基于Kronecker草图的快速图神经切核 链接:https://arxiv.org/abs/2112.02446

作者:Shunhua Jiang,Yunze Man,Zhao Song,Zheng Yu,Danyang Zhuo 备注:AAAI 2022 摘要:许多深度学习任务必须处理图形(例如,蛋白质结构、社交网络、源代码抽象语法树)。由于这些任务的重要性,人们转向图形神经网络(GNN)作为实际的图形学习方法。GNNs由于其令人信服的性能而得到了广泛的应用。不幸的是,使用GNN的一个主要障碍是GNN需要大量的时间和资源进行训练。最近,一种新的图形数据学习方法是图形神经切线核(GNTK)[Du,Hou,Salakhutdinov,Poczos,Wang和Xu 19]。GNTK是神经切线核(NTK)[Jacot,Gabriel和Hongler 18](核方法)在图形数据上的应用,求解NTK回归相当于使用梯度下降训练无限宽的神经网络。使用GNTK的主要好处是,与任何内核方法类似,GNTK的参数可以在一个步骤中直接求解。这可以避免耗时的梯度下降。同时,草图已经越来越多地用于加速各种优化问题,包括求解核回归。给定一个$n$图的内核矩阵,在解决内核回归时使用草图可以将运行时间减少到$o(n^3)$。但不幸的是,这些方法通常需要事先对内核矩阵有广泛的了解,而在GNTK的情况下,我们发现内核矩阵的构造已经是$O(n^2N^4)$,假设每个图有$n$个节点。当图形$N$的大小增加时,内核矩阵构造时间可能是一个主要的性能瓶颈。因此,一个自然的问题是,我们是否可以加快内核矩阵的构造,以提高GNTK回归的端到端运行时间。本文给出了第一个在$o(n^2N^3)$运行时间内构造核矩阵的算法。 摘要:Many deep learning tasks have to deal with graphs (e.g., protein structures, social networks, source code abstract syntax trees). Due to the importance of these tasks, people turned to Graph Neural Networks (GNNs) as the de facto method for learning on graphs. GNNs have become widely applied due to their convincing performance. Unfortunately, one major barrier to using GNNs is that GNNs require substantial time and resources to train. Recently, a new method for learning on graph data is Graph Neural Tangent Kernel (GNTK) [Du, Hou, Salakhutdinov, Poczos, Wang and Xu 19]. GNTK is an application of Neural Tangent Kernel (NTK) [Jacot, Gabriel and Hongler 18] (a kernel method) on graph data, and solving NTK regression is equivalent to using gradient descent to train an infinite-wide neural network. The key benefit of using GNTK is that, similar to any kernel method, GNTK's parameters can be solved directly in a single step. This can avoid time-consuming gradient descent. Meanwhile, sketching has become increasingly used in speeding up various optimization problems, including solving kernel regression. Given a kernel matrix of $n$ graphs, using sketching in solving kernel regression can reduce the running time to $o(n^3)$. But unfortunately such methods usually require extensive knowledge about the kernel matrix beforehand, while in the case of GNTK we find that the construction of the kernel matrix is already $O(n^2N^4)$, assuming each graph has $N$ nodes. The kernel matrix construction time can be a major performance bottleneck when the size of graphs $N$ increases. A natural question to ask is thus whether we can speed up the kernel matrix construction to improve GNTK regression's end-to-end running time. This paper provides the first algorithm to construct the kernel matrix in $o(n^2N^3)$ running time.

【6】 DMGCRN: Dynamic Multi-Graph Convolution Recurrent Network for Traffic Forecasting 标题:DMGCRN:用于交通量预测的动态多图卷积递归网络 链接:https://arxiv.org/abs/2112.02264

作者:Yanjun Qin,Yuchen Fang,Haiyong Luo,Fang Zhao,Chenxing Wang 机构: and Chenxing Wang are withthe School of Computer Science (National Pilot Software EngineeringSchool), Beijing University of Posts and Telecommunications 备注:10 pages 摘要:交通预测是智能交通系统(ITS)的一个问题,对个人和公共机构至关重要。因此,研究者们非常重视处理交通系统复杂的时空相关性,以便进行准确的预测。然而,存在两个挑战:1)大多数交通预测研究主要关注相邻传感器的相关性建模,而忽略了遥感器的相关性,例如具有相似时空模式的商业区;2) 现有的图卷积网络(GCN)中使用静态邻接矩阵的方法不足以反映交通系统的动态空间相关性。此外,细粒度方法使用自关注来建模所有传感器的动态相关性,忽略了道路网络中的层次信息,并且具有二次计算复杂性。本文提出了一种新的动态多图卷积递归网络(DMGCRN)来解决上述问题,它可以同时对距离的空间相关性、结构的空间相关性和时间相关性进行建模。我们不仅使用基于距离的图从距离较近的节点获取空间信息,而且还构建了一种新的潜在图,对道路之间的结构相关性进行编码,以从结构相似的节点获取空间信息。此外,我们将每个传感器的邻域划分为粗粒度区域,并在不同时间动态地为每个区域分配不同的权重。同时,我们将动态多图卷积网络集成到选通递归单元(GRU)中以捕获时间依赖性。在三个真实交通数据集上的大量实验表明,我们提出的算法优于最先进的基线。 摘要:Traffic forecasting is a problem of intelligent transportation systems (ITS) and crucial for individuals and public agencies. Therefore, researches pay great attention to deal with the complex spatio-temporal dependencies of traffic system for accurate forecasting. However, there are two challenges: 1) Most traffic forecasting studies mainly focus on modeling correlations of neighboring sensors and ignore correlations of remote sensors, e.g., business districts with similar spatio-temporal patterns; 2) Prior methods which use static adjacency matrix in graph convolutional networks (GCNs) are not enough to reflect the dynamic spatial dependence in traffic system. Moreover, fine-grained methods which use self-attention to model dynamic correlations of all sensors ignore hierarchical information in road networks and have quadratic computational complexity. In this paper, we propose a novel dynamic multi-graph convolution recurrent network (DMGCRN) to tackle above issues, which can model the spatial correlations of distance, the spatial correlations of structure, and the temporal correlations simultaneously. We not only use the distance-based graph to capture spatial information from nodes are close in distance but also construct a novel latent graph which encoded the structure correlations among roads to capture spatial information from nodes are similar in structure. Furthermore, we divide the neighbors of each sensor into coarse-grained regions, and dynamically assign different weights to each region at different times. Meanwhile, we integrate the dynamic multi-graph convolution network into the gated recurrent unit (GRU) to capture temporal dependence. Extensive experiments on three real-world traffic datasets demonstrate that our proposed algorithm outperforms state-of-the-art baselines.

【7】 Incentive Compatible Pareto Alignment for Multi-Source Large Graphs 标题:多源大型图的激励相容Pareto对齐 链接:https://arxiv.org/abs/2112.02792

作者:Jian Liang,Fangrui Lv,Di Liu,Zehui Dai,Xu Tian,Shuang Li,Fei Wang,Han Li 机构:Alibaba Group, China, Beijing Institute of Technology, China, Department of Population Health Sciences, Weill Cornell Medicine, USA 摘要:在本文中,我们主要研究在多源大规模数据上学习有效的实体匹配模型。对于实际应用,我们放松了数据分布/空间或实体标识在源之间共享的典型假设,并提出了一个放松的多源大规模实体匹配(RMLE)问题。该问题的挑战包括1)如何在来源之间协调大型实体以共享信息,以及2)如何减轻联合学习多源数据的负迁移。更糟糕的是,一个实际问题是两个挑战之间的纠缠。具体而言,不正确的排列可能会增加负转移;而减轻一个源的负迁移可能会导致其他源的表示学习不良,进而降低对齐精度。为了应对纠缠挑战,我们指出,关键是首先在帕累托前沿优化的基础上优化信息共享,通过显示信息共享显著影响描述负转移下限的帕累托前沿。因此,我们提出了一种激励相容帕累托比对(ICPA)方法,首先基于帕累托前沿优化优化跨源比对,然后缓解优化比对上的负转移约束。这种机制使得每个源都可以根据其真实偏好进行学习,而不必担心其他源的表示会恶化。具体而言,帕累托前沿优化鼓励最小化负转移的下限,从而优化是否对齐以及对齐哪个。在四个大型数据集上提供了综合的实证评估结果,以证明ICPA的有效性和优越性。搜索广告平台上的在线A/B测试结果也证明了ICPA在生产环境中的有效性。 摘要:In this paper, we focus on learning effective entity matching models over multi-source large-scale data. For real applications, we relax typical assumptions that data distributions/spaces, or entity identities are shared between sources, and propose a Relaxed Multi-source Large-scale Entity-matching (RMLE) problem. Challenges of the problem include 1) how to align large-scale entities between sources to share information and 2) how to mitigate negative transfer from joint learning multi-source data. What's worse, one practical issue is the entanglement between both challenges. Specifically, incorrect alignments may increase negative transfer; while mitigating negative transfer for one source may result in poorly learned representations for other sources and then decrease alignment accuracy. To handle the entangled challenges, we point out that the key is to optimize information sharing first based on Pareto front optimization, by showing that information sharing significantly influences the Pareto front which depicts lower bounds of negative transfer. Consequently, we proposed an Incentive Compatible Pareto Alignment (ICPA) method to first optimize cross-source alignments based on Pareto front optimization, then mitigate negative transfer constrained on the optimized alignments. This mechanism renders each source can learn based on its true preference without worrying about deteriorating representations of other sources. Specifically, the Pareto front optimization encourages minimizing lower bounds of negative transfer, which optimizes whether and which to align. Comprehensive empirical evaluation results on four large-scale datasets are provided to demonstrate the effectiveness and superiority of ICPA. Online A/B test results at a search advertising platform also demonstrate the effectiveness of ICPA in production environments.

Transformer(3篇)

【1】 Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Conquers All StarCraftII Tasks 标题:离线预训练多Agent决策转换器:一个大序列模型征服所有StarCraftII任务 链接:https://arxiv.org/abs/2112.02845

作者:Linghui Meng,Muning Wen,Yaodong Yang,Chenyang Le,Xiyun Li,Weinan Zhang,Ying Wen,Haifeng Zhang,Jun Wang,Bo Xu 机构:Institute of Automation, CAS, China,School of Artificial Intelligence, UCAS, China, Shanghai Jiao Tong University,King’s College London, University College London 备注:17 pages, 6 figures 摘要:离线强化学习利用静态数据集学习最佳策略,无需访问环境。由于多智能体在线交互的昂贵性和训练过程中对样本数量的要求,这种技术适合于多智能体学习任务。然而,在多智能体强化学习(MARL)中,离线预训练与在线微调的范例从未被研究过,离线MARL研究的数据集或基准也不可用。在本文中,我们试图回答以下问题:MARL中的离线预训练是否能够学习有助于提高多个下游任务性能的通用策略表示。我们首先介绍了第一个基于StarCraftII环境的具有不同质量级别的离线MARL数据集,然后提出了一种新的用于有效离线学习的多智能体决策转换器(MADT)体系结构。MADT利用Transformer的时间表示建模能力,并将其与离线和在线MARL任务集成。MADT的一个重要优点是,它学习可在不同任务场景下在不同类型的代理之间传输的通用策略。在星际争霸II离线数据集上进行评估时,MADT的性能优于最先进的离线RL基线。当应用于在线任务时,预先训练的MADT显著提高了样本效率,即使在Zero-Shot的情况下也有很强的性能。据我们所知,这是第一项研究和证明离线预训练模型在MARL中的样本效率和通用性增强方面的有效性的工作。 摘要:Offline reinforcement learning leverages static datasets to learn optimal policies with no necessity to access the environment. This technique is desirable for multi-agent learning tasks due to the expensiveness of agents' online interactions and the demanding number of samples during training. Yet, in multi-agent reinforcement learning (MARL), the paradigm of offline pre-training with online fine-tuning has never been studied, nor datasets or benchmarks for offline MARL research are available. In this paper, we try to answer the question of whether offline pre-training in MARL is able to learn generalisable policy representations that can help improve the performance of multiple downstream tasks. We start by introducing the first offline MARL dataset with diverse quality levels based on the StarCraftII environment, and then propose the novel architecture of multi-agent decision transformer (MADT) for effective offline learning. MADT leverages Transformer's modelling ability of temporal representations and integrates it with both offline and online MARL tasks. A crucial benefit of MADT is that it learns generalisable policies that can transfer between different types of agents under different task scenarios. When evaluated on StarCraft II offline dataset, MADT demonstrates superior performance than state-of-the-art offline RL baselines. When applied to online tasks, the pre-trained MADT significantly improves sample efficiency, and enjoys strong performance even in zero-shot cases. To our best knowledge, this is the first work that studies and demonstrates the effectiveness of offline pre-trained models in terms of sample efficiency and generalisability enhancements in MARL.

【2】 STformer: A Noise-Aware Efficient Spatio-Temporal Transformer Architecture for Traffic Forecasting 标题:STformer:一种噪声感知的高效流量预测时空转换器结构 链接:https://arxiv.org/abs/2112.02740

作者:Yanjun Qin,Yuchen Fang,Haiyong Luo,Liang Zeng,Fang Zhao,Chenxing Wang 机构:Beijing University of Posts and, Telecommunications, Institute of Computing Technology, Chinese Academy of Sciences, Tsinghua University 摘要:交通预测在智能交通系统中起着不可或缺的作用,它使日常出行更加方便和安全。然而,时空相关性的动态演化使得准确的交通预测非常困难。现有工作主要采用图形神经网络(GNNs)和深度时间序列模型(例如,递归神经网络)来捕获动态交通系统中的复杂时空模式。对于空间模式,GNNs很难提取全局空间信息,即道路网络中的遥感信息。虽然我们可以像以前的工作一样利用自我注意来提取全局空间信息,但它也伴随着巨大的资源消耗。就时间模式而言,交通数据不仅易于识别每日和每周趋势,而且难以识别事故(如车祸和雷雨)造成的短期噪声。现有的流量模型很难区分时间序列中复杂的时间模式,因此很难得到准确的时间依赖关系。为了解决上述问题,我们提出了一种新的噪声感知高效的时空转换器结构,用于准确的流量预测,名为STformer。前者由两部分组成,即噪声感知的时间自我注意(NATSA)和基于图的稀疏空间自我注意(GBS3A)。NATSA从时间序列中分离出高频分量和低频分量,分别通过可学习滤波器和时间自我注意去除噪声和捕获稳定的时间相关性。GBS3A用基于图的稀疏查询取代了普通自我注意中的完整查询,以减少时间和内存使用。在四个真实交通数据集上的实验表明,STFORER以较低的计算成本优于最新的基线。 摘要:Traffic forecasting plays an indispensable role in the intelligent transportation system, which makes daily travel more convenient and safer. However, the dynamic evolution of spatio-temporal correlations makes accurate traffic forecasting very difficult. Existing work mainly employs graph neural netwroks (GNNs) and deep time series models (e.g., recurrent neural networks) to capture complex spatio-temporal patterns in the dynamic traffic system. For the spatial patterns, it is difficult for GNNs to extract the global spatial information, i.e., remote sensors information in road networks. Although we can use the self-attention to extract global spatial information as in the previous work, it is also accompanied by huge resource consumption. For the temporal patterns, traffic data have not only easy-to-recognize daily and weekly trends but also difficult-to-recognize short-term noise caused by accidents (e.g., car accidents and thunderstorms). Prior traffic models are difficult to distinguish intricate temporal patterns in time series and thus hard to get accurate temporal dependence. To address above issues, we propose a novel noise-aware efficient spatio-temporal Transformer architecture for accurate traffic forecasting, named STformer. STformer consists of two components, which are the noise-aware temporal self-attention (NATSA) and the graph-based sparse spatial self-attention (GBS3A). NATSA separates the high-frequency component and the low-frequency component from the time series to remove noise and capture stable temporal dependence by the learnable filter and the temporal self-attention, respectively. GBS3A replaces the full query in vanilla self-attention with the graph-based sparse query to decrease the time and memory usage. Experiments on four real-world traffic datasets show that STformer outperforms state-of-the-art baselines with lower computational cost.

【3】 NN-LUT: Neural Approximation of Non-Linear Operations for Efficient Transformer Inference 标题:NN-LUT:用于有效Transformer推理的非线性运算的神经逼近 链接:https://arxiv.org/abs/2112.02191

作者:Joonsang Yu,Junki Park,Seongmin Park,Minsoo Kim,Sihwa Lee,Dong Hyun Lee,Jungwook Choi 机构:NAVER AI Lab, Face, NAVER Clova, SAIT, Hanyang University 备注:7 pages, 3 figures 摘要:非线性操作(如GELU、层规范化和Softmax)是Transformer模型的重要组成部分,但成本高昂。以前的一些工作通过查找表或整数计算简化了这些操作,但这种近似方法的精度较低,或者硬件成本较高,延迟时间较长。本文提出了一个精确且硬件友好的近似框架,用于有效的Transformer推断。我们的框架采用一个简单的神经网络作为通用逼近器,其结构等价地转换为LUT。提出的称为NN-LUT的框架可以准确地替换流行的BERT模型中的所有非线性操作,从而显著减少面积、功耗和延迟。 摘要:Non-linear operations such as GELU, Layer normalization, and Softmax are essential yet costly building blocks of Transformer models. Several prior works simplified these operations with look-up tables or integer computations, but such approximations suffer inferior accuracy or considerable hardware cost with long latency. This paper proposes an accurate and hardware-friendly approximation framework for efficient Transformer inference. Our framework employs a simple neural network as a universal approximator with its structure equivalently transformed into a LUT. The proposed framework called NN-LUT can accurately replace all the non-linear operations in popular BERT models with significant reductions in area, power consumption, and latency.

GAN|对抗|攻击|生成相关(6篇)

【1】 Simulation Intelligence: Towards a New Generation of Scientific Methods 标题:仿真智能:迈向新一代科学方法 链接:https://arxiv.org/abs/2112.03235

作者:Alexander Lavin,Hector Zenil,Brooks Paige,David Krakauer,Justin Gottschlich,Tim Mattson,Anima Anandkumar,Sanjay Choudry,Kamil Rocki,Atılım Güneş Baydin,Carina Prunkl,Brooks Paige,Olexandr Isayev,Erik Peterson,Peter L. McMahon,Jakob Macke,Kyle Cranmer,Jiaxin Zhang,Haruko Wainwright,Adi Hanuka,Manuela Veloso,Samuel Assefa,Stephan Zheng,Avi Pfeffer 机构:Institute for Simulation Intelligence, Alan Turing Institute, Santa Fe Institute, Intel Labs, Nvidia, Neuralink, Atılım Güne¸s Baydin, University of Oxford, Carnegie Mellon University, Cornell University, Jakob H. Macke, University of Tübingen, New York University 摘要:最初的“七个主题”提出了科学计算领域基本方法的路线图,其中主题是捕获计算和数据移动模式的算法方法。我们提出了“模拟智能的九个主题”,这是一个开发和集成科学计算、科学模拟和人工智能合并所需的基本算法的路线图。我们简称这种合并模拟智能(SI)。我们认为,模拟智能的主题是相互关联和相互依存的,就像操作系统各层中的组件一样。利用这一隐喻,我们探索了仿真智能操作系统堆栈(SI堆栈)的每一层的本质以及其中的主题:(1)多物理和多尺度建模;(2) 代理建模与仿真;(3) 基于仿真的推理;(4) 因果建模与推理;(5) 基于Agent的建模;(6) 概率规划;(7) 可微规划;(8) 开放式优化;(9) 机器编程。我们相信,母题之间的协调努力为加速科学发现提供了巨大的机会,从解决合成生物学和气候科学中的反问题,到指导核能实验和预测社会经济环境中的紧急行为。我们详细阐述了SI堆栈的每一层,详细介绍了最先进的方法,展示了突出挑战和机遇的示例,并倡导通过具体方式推进主题及其组合的协同效应。推进和集成这些技术可以实现一种健壮、高效的假设模拟分析类型的科学方法,我们将通过几个人机团队和自动化科学的用例介绍这种方法。 摘要:The original "Seven Motifs" set forth a roadmap of essential methods for the field of scientific computing, where a motif is an algorithmic method that captures a pattern of computation and data movement. We present the "Nine Motifs of Simulation Intelligence", a roadmap for the development and integration of the essential algorithms necessary for a merger of scientific computing, scientific simulation, and artificial intelligence. We call this merger simulation intelligence (SI), for short. We argue the motifs of simulation intelligence are interconnected and interdependent, much like the components within the layers of an operating system. Using this metaphor, we explore the nature of each layer of the simulation intelligence operating system stack (SI-stack) and the motifs therein: (1) Multi-physics and multi-scale modeling; (2) Surrogate modeling and emulation; (3) Simulation-based inference; (4) Causal modeling and inference; (5) Agent-based modeling; (6) Probabilistic programming; (7) Differentiable programming; (8) Open-ended optimization; (9) Machine programming. We believe coordinated efforts between motifs offers immense opportunity to accelerate scientific discovery, from solving inverse problems in synthetic biology and climate science, to directing nuclear energy experiments and predicting emergent behavior in socioeconomic settings. We elaborate on each layer of the SI-stack, detailing the state-of-art methods, presenting examples to highlight challenges and opportunities, and advocating for specific ways to advance the motifs and the synergies from their combinations. Advancing and integrating these technologies can enable a robust and efficient hypothesis-simulation-analysis type of scientific method, which we introduce with several use-cases for human-machine teaming and automated science.

【2】 Context-Aware Transfer Attacks for Object Detection 标题:面向对象检测的上下文感知传输攻击 链接:https://arxiv.org/abs/2112.03223

作者:Zikui Cai,Xinxin Xie,Shasha Li,Mingjun Yin,Chengyu Song,Srikanth V. Krishnamurthy,Amit K. Roy-Chowdhury,M. Salman Asif 机构: Electrical and Computer Engineering, University of California Riverside, Computer Science and Engineering, University of California Riverside 备注:accepted to AAAI 2022 摘要:近年来,针对图像分类器的黑盒转移攻击得到了广泛的研究。相比之下,在目标探测器的转移攻击方面进展甚微。对象检测器对图像进行整体查看,一个对象(或缺少对象)的检测通常取决于场景中的其他对象。这使得这种检测器固有的上下文感知和对抗性攻击比那些针对图像分类器的攻击更具挑战性。在本文中,我们提出了一种新的方法来生成对象检测器的上下文感知攻击。我们证明,通过使用对象及其相对位置和大小的共现性作为上下文信息,我们可以成功地生成有针对性的错误分类攻击,在blackbox对象检测器上实现比最新技术更高的传输成功率。我们使用PASCAL VOC和MS COCO数据集的图像在各种物体检测器上测试我们的方法,并证明与其他最先进的方法相比,性能提高了20美元百分点。 摘要:Blackbox transfer attacks for image classifiers have been extensively studied in recent years. In contrast, little progress has been made on transfer attacks for object detectors. Object detectors take a holistic view of the image and the detection of one object (or lack thereof) often depends on other objects in the scene. This makes such detectors inherently context-aware and adversarial attacks in this space are more challenging than those targeting image classifiers. In this paper, we present a new approach to generate context-aware attacks for object detectors. We show that by using co-occurrence of objects and their relative locations and sizes as context information, we can successfully generate targeted mis-categorization attacks that achieve higher transfer success rates on blackbox object detectors than the state-of-the-art. We test our approach on a variety of object detectors with images from PASCAL VOC and MS COCO datasets and demonstrate up to $20$ percentage points improvement in performance compared to the other state-of-the-art methods.

【3】 ML Attack Models: Adversarial Attacks and Data Poisoning Attacks 标题:ML攻击模型:对抗性攻击和数据中毒攻击 链接:https://arxiv.org/abs/2112.02797

作者:Jing Lin,Long Dang,Mohamed Rahouti,Kaiqi Xiong 机构: Corresponding Author, Contents 摘要:许多最先进的ML模型在各种任务(如图像分类)中的表现都优于人类。由于具有如此优异的性能,ML模型今天得到了广泛的应用。然而,对抗性攻击和数据中毒攻击的存在确实对ML模型的鲁棒性提出了质疑。例如,Engstrom等人证明,最先进的图像分类器很容易被任意图像上的小旋转所愚弄。随着ML系统越来越多地集成到安全和安保敏感的应用程序中,对抗性攻击和数据中毒攻击构成了相当大的威胁。本章重点介绍ML安全的两个广泛而重要的领域:对抗性攻击和数据中毒攻击。 摘要:Many state-of-the-art ML models have outperformed humans in various tasks such as image classification. With such outstanding performance, ML models are widely used today. However, the existence of adversarial attacks and data poisoning attacks really questions the robustness of ML models. For instance, Engstrom et al. demonstrated that state-of-the-art image classifiers could be easily fooled by a small rotation on an arbitrary image. As ML systems are being increasingly integrated into safety and security-sensitive applications, adversarial attacks and data poisoning attacks pose a considerable threat. This chapter focuses on the two broad and important areas of ML security: adversarial attacks and data poisoning attacks.

【4】 Stochastic Local Winner-Takes-All Networks Enable Profound Adversarial Robustness 标题:随机本地赢家通吃网络实现深刻的对手鲁棒性 链接:https://arxiv.org/abs/2112.02671

作者:Konstantinos P. Panousis,Sotirios Chatzis,Sergios Theodoridis 机构:Cyprus University of Technology, Limassol, Cyprus, National and Kapodistrian University of Athens, Greece, Aalborg University, Aalborg, Denmark 备注:Bayesian Deep Learning Workshop, NeurIPS 2021 摘要:这项工作探索了基于随机竞争的激活,即随机局部赢家通吃(LWTA),对抗强大的(基于梯度的)白盒和黑盒对抗攻击的效力;我们特别关注对抗性训练环境。在我们的工作中,我们将传统的基于ReLU的非线性替换为包含局部和随机竞争线性单元的块。每个网络层的输出现在产生稀疏输出,这取决于每个块中的优胜者采样结果。我们依靠变分贝叶斯框架进行训练和推理;我们结合了传统的基于PGD的对抗训练论点,以提高整体对抗鲁棒性。正如我们的实验所表明的,新兴网络对强大的对抗性攻击具有最先进的鲁棒性,同时在良性情况下保持非常高的分类率。 摘要:This work explores the potency of stochastic competition-based activations, namely Stochastic Local Winner-Takes-All (LWTA), against powerful (gradient-based) white-box and black-box adversarial attacks; we especially focus on Adversarial Training settings. In our work, we replace the conventional ReLU-based nonlinearities with blocks comprising locally and stochastically competing linear units. The output of each network layer now yields a sparse output, depending on the outcome of winner sampling in each block. We rely on the Variational Bayesian framework for training and inference; we incorporate conventional PGD-based adversarial training arguments to increase the overall adversarial robustness. As we experimentally show, the arising networks yield state-of-the-art robustness against powerful adversarial attacks while retaining very high classification rate in the benign case.

【5】 Emojich -- zero-shot emoji generation using Russian language: a technical report 标题:Emojich--使用俄语生成Zero-Shot表情符号:技术报告 链接:https://arxiv.org/abs/2112.02448

作者:Alex Shonenkov,Daria Bakshandaeva,Denis Dimitrov,Aleksandr Nikolich 机构:Sber AI, MIPT, Sber AI, Lomonosov MSU, Sber AI, MIREA 备注:5 pages, 4 figures and big figure at appendix, technical report 摘要:本技术报告介绍了一个文本到图像的神经网络“Emojich”,该网络以俄语字幕为条件生成表情符号。我们的目标是在微调阶段保持预训练大模型ruDALL-E Malevich(XL)1.3B参数的泛化能力,同时为生成的图像提供特殊风格。这里介绍了一些工程方法、代码实现、用于复制结果的所有超参数以及每个人都可以创建自己的定制贴纸集的电报机器人。此外,还展示了通过“Emojich”模型获得的一些新生成的表情。 摘要:This technical report presents a text-to-image neural network "Emojich" that generates emojis using captions in Russian language as a condition. We aim to keep the generalization ability of a pretrained big model ruDALL-E Malevich (XL) 1.3B parameters at the fine-tuning stage, while giving special style to the images generated. Here are presented some engineering methods, code realization, all hyper-parameters for reproducing results and a Telegram bot where everyone can create their own customized sets of stickers. Also, some newly generated emojis obtained by "Emojich" model are demonstrated.

【6】 My(o) Armband Leaks Passwords: An EMG and IMU Based Keylogging Side-Channel Attack 标题:我的(O)臂章泄露密码:一种基于EMG和IMU的键盘记录侧通道攻击 链接:https://arxiv.org/abs/2112.02382

作者:Matthias Gazzari,Annemarie Mattmann,Max Maass,Matthias Hollick 机构: Technical University of Darmstadt 备注:None 摘要:不断收集用户各种传感器数据的可穿戴设备增加了推断无意和敏感信息(如在物理键盘上键入的密码)的机会。我们深入研究了使用肌电图(EMG)数据的潜力,这是一种市场上新出现的传感器模式,但最近在增强现实(AR)可穿戴设备的背景下引起了人们的注意,用于键盘记录侧通道攻击。我们的方法基于神经网络,用于现实场景中的主体间攻击,使用Myo臂带收集传感器数据。在我们的方法中,与加速度计和陀螺仪相比,EMG数据已被证明是最重要的信息来源,从而提高了击键检测性能。对于我们对原始数据的端到端方法,我们报告击键检测的平均平衡精度约为76%,对于不同强度密码的密钥识别,52个类别的前3名密钥平均精度约为32%。我们已经创建了一个广泛的数据集,包括37名志愿者记录的超过310000次击键,该数据集与用于创建给定结果的源代码一起作为开放访问提供。 摘要:Wearables that constantly collect various sensor data of their users increase the chances for inferences of unintentional and sensitive information such as passwords typed on a physical keyboard. We take a thorough look at the potential of using electromyographic (EMG) data, a sensor modality which is new to the market but has lately gained attention in the context of wearables for augmented reality (AR), for a keylogging side-channel attack. Our approach is based on neural networks for a between-subject attack in a realistic scenario using the Myo Armband to collect the sensor data. In our approach, the EMG data has proven to be the most prominent source of information compared to the accelerometer and gyroscope, increasing the keystroke detection performance. For our end-to-end approach on raw data, we report a mean balanced accuracy of about 76 % for the keystroke detection and a mean top-3 key accuracy of about 32 % on 52 classes for the key identification on passwords of varying strengths. We have created an extensive dataset including more than 310 000 keystrokes recorded from 37 volunteers, which is available as open access along with the source code used to create the given results.

半/弱/无/有监督|不确定性|主动学习(5篇)

【1】 Active Learning Meets Optimized Item Selection 标题:主动学习满足优化选题需求 链接:https://arxiv.org/abs/2112.03105

作者:Bernard Kleynhans,Xin Wang,Serdar Kadıoğlu 机构:AI Center of Excellence, Fidelity Investments, Boston, USA 备注:IJCAI 2021 Data Science Meets Optimization Workshop (DSO@IJCAI 2021) 摘要:设计具有有限或没有可用训练数据的推荐系统仍然是一项挑战。为此,提出了一个新的组合优化问题,用于生成用于实验的优化项目选择,目的是缩短随机训练数据的收集时间。我们首先介绍了优化项目选择问题的概况和解决该问题的多层次优化框架。该方法集成了离散优化、无监督聚类和潜在文本嵌入等技术。然后,我们将讨论如何将优化项目选择与主动学习结合起来,作为随机探索的一部分。 摘要:Designing recommendation systems with limited or no available training data remains a challenge. To that end, a new combinatorial optimization problem is formulated to generate optimized item selection for experimentation with the goal to shorten the time for collecting randomized training data. We first present an overview of the optimized item selection problem and a multi-level optimization framework to solve it. The approach integrates techniques from discrete optimization, unsupervised clustering, and latent text embeddings. We then discuss how to incorporate optimized item selection with active learning as part of randomized exploration in an ongoing fashion.

【2】 A Tale of Color Variants: Representation and Self-Supervised Learning in Fashion E-Commerce 标题:颜色变体的故事:服装电子商务中的表征和自我监督学习 链接:https://arxiv.org/abs/2112.02910

作者:Ujjal Kr Dutta,Sandeep Repakula,Maulik Parmar,Abhinav Ravi 机构:Data Sciences-Image Sciences, Myntra 备注:In Annual Conference on Innovative Applications of Artificial Intelligence (IAAI)/ AAAI Conference on Artificial Intelligence (AAAI) 2022. arXiv admin note: substantial text overlap with arXiv:2104.08581 摘要:在本文中,我们讨论了时尚电子商务中的一个关键问题(与客户体验以及收入有关):颜色变体识别,即识别在设计(或风格)上完全匹配但仅在颜色上不同的时尚产品。我们提出了一个通用框架,该框架的核心是利用深度视觉表征学习,为我们的时尚电子商务平台解决这个问题。我们的框架可以通过手动获取的三元组形式的监控信号进行训练。然而,在捕获所有困难的情况下,为时尚电子商务平台(如我们的平台)中通常存在的整个庞大数据集合获取手动注释是不可行的。但是,为了拯救我们,有趣的是,我们观察到,时尚电子商务中的这一关键问题也可以通过简单的基于颜色抖动的图像增强来解决,这一点最近在对比自监督学习(SSL)文献中广为流行,该文献旨在学习视觉表示,而不使用手动标签。这自然会在我们的脑海中引出一个问题:我们是否可以在用例中利用SSL,并且仍然可以获得与受监管框架相当的性能?答案是,是的!因为,颜色变化的时尚对象只不过是一种风格的表现形式,不同的颜色,一个经过训练对颜色保持不变的模型(有监督或没有监督)应该能够识别这一点!这是本文进一步从定性和定量两方面论证的内容,同时评估了两种最先进的SSL技术,并提出了一种新方法。 摘要:In this paper, we address a crucial problem in fashion e-commerce (with respect to customer experience, as well as revenue): color variants identification, i.e., identifying fashion products that match exactly in their design (or style), but only to differ in their color. We propose a generic framework, that leverages deep visual Representation Learning at its heart, to address this problem for our fashion e-commerce platform. Our framework could be trained with supervisory signals in the form of triplets, that are obtained manually. However, it is infeasible to obtain manual annotations for the entire huge collection of data usually present in fashion e-commerce platforms, such as ours, while capturing all the difficult corner cases. But, to our rescue, interestingly we observed that this crucial problem in fashion e-commerce could also be solved by simple color jitter based image augmentation, that recently became widely popular in the contrastive Self-Supervised Learning (SSL) literature, that seeks to learn visual representations without using manual labels. This naturally led to a question in our mind: Could we leverage SSL in our use-case, and still obtain comparable performance to our supervised framework? The answer is, Yes! because, color variant fashion objects are nothing but manifestations of a style, in different colors, and a model trained to be invariant to the color (with, or without supervision), should be able to recognize this! This is what the paper further demonstrates, both qualitatively, and quantitatively, while evaluating a couple of state-of-the-art SSL techniques, and also proposing a novel method.

【3】 Diverse, Global and Amortised Counterfactual Explanations for Uncertainty Estimates 标题:不确定性估计的多样化、全局性和摊销反事实解释 链接:https://arxiv.org/abs/2112.02646

作者:Dan Ley,Umang Bhatt,Adrian Weller 机构:University of Cambridge, UK,The Alan Turing Institute, UK 备注:Accepted as a conference paper to AAAI 2022 摘要:为了从可微概率模型中解释不确定性估计,最近的工作建议为模型不确定的给定数据点生成单个反事实潜在不确定性解释(CLUE),识别输入的单个流形变化,从而使模型在其预测中变得更加确定。我们将探索范围扩大到检查{delta}-线索,即潜在空间中原始输入的{delta}球内的一组潜在线索。我们研究了这些集合的多样性,发现许多线索是多余的;因此,我们提出了不同的线索({ abla}-CLUE),这是一组线索,每个线索都对如何减少与输入相关的不确定性提出了不同的解释。然后,我们进一步提出了全局摊销线索(GLAM-CLUE),这是一种独特而新颖的方法,它可以学习特定不确定输入组上的摊销映射,并在单个函数调用中将其有效地转换为模型确定的输入。我们的实验表明{delta}-CLUE、{ abla}-CLUE和GLAM-CLUE都解决了CLUE的缺点,并为实践者提供了不确定性估计的有益解释。 摘要:To interpret uncertainty estimates from differentiable probabilistic models, recent work has proposed generating a single Counterfactual Latent Uncertainty Explanation (CLUE) for a given data point where the model is uncertain, identifying a single, on-manifold change to the input such that the model becomes more certain in its prediction. We broaden the exploration to examine {delta}-CLUE, the set of potential CLUEs within a {delta} ball of the original input in latent space. We study the diversity of such sets and find that many CLUEs are redundant; as such, we propose DIVerse CLUE ({ abla}-CLUE), a set of CLUEs which each propose a distinct explanation as to how one can decrease the uncertainty associated with an input. We then further propose GLobal AMortised CLUE (GLAM-CLUE), a distinct and novel method which learns amortised mappings on specific groups of uncertain inputs, taking them and efficiently transforming them in a single function call into inputs for which a model will be certain. Our experiments show that {delta}-CLUE, { abla}-CLUE, and GLAM-CLUE all address shortcomings of CLUE and provide beneficial explanations of uncertainty estimates to practitioners.

【4】 Probabilistic Deep Learning to Quantify Uncertainty in Air Quality Forecasting 标题:概率深度学习在空气质量预报不确定性量化中的应用 链接:https://arxiv.org/abs/2112.02622

作者:Abdulmajid Murad,Frank Alexander Kraemer,Kerstin Bach,Gavin Taylor 机构:Norwegian University of Science and Technology, United States Naval Academy 备注:None 摘要:数据驱动的空气质量预测最近实现了更准确的短期预测。尽管取得了成功,但大多数当前的数据驱动解决方案都缺乏对模型不确定性的适当量化,无法传达对预测的信任程度。最近,在概率深度学习中开发了几种实用的不确定性估计工具。然而,这些工具在空气质量预测领域尚未得到实证应用和广泛比较。因此,这项工作将最先进的不确定性量化技术应用于实际环境中的空气质量预测。通过大量实验,我们描述了训练概率模型,并基于经验性能、置信度估计的可靠性和实用性评估了它们的预测不确定性。我们还建议使用“自由”对抗训练和利用空气质量数据固有的时间和空间相关性来改进这些模型。我们的实验表明,提出的模型在量化数据驱动的空气质量预测中的不确定性方面比以前的工作表现得更好。总的来说,贝叶斯神经网络提供了一个更可靠的不确定性估计,但在实现和扩展方面具有挑战性。其他可扩展的方法,如deep ensemble、Monte Carlo(MC)dropout和随机加权平均高斯(SWAG),如果应用正确,可以表现良好,但在性能指标上有不同的权衡和细微的变化。最后,我们的结果显示了不确定性估计的实际影响,并证明了概率模型确实更适合于做出明智的决策。代码和数据集位于url{https://github.com/Abdulmajid-Murad/deep_probabilistic_forecast} 摘要:Data-driven forecasts of air quality have recently achieved more accurate short-term predictions. Despite their success, most of the current data-driven solutions lack proper quantifications of model uncertainty that communicate how much to trust the forecasts. Recently, several practical tools to estimate uncertainty have been developed in probabilistic deep learning. However, there have not been empirical applications and extensive comparisons of these tools in the domain of air quality forecasts. Therefore, this work applies state-of-the-art techniques of uncertainty quantification in a real-world setting of air quality forecasts. Through extensive experiments, we describe training probabilistic models and evaluate their predictive uncertainties based on empirical performance, reliability of confidence estimate, and practical applicability. We also propose improving these models using "free" adversarial training and exploiting temporal and spatial correlation inherent in air quality data. Our experiments demonstrate that the proposed models perform better than previous works in quantifying uncertainty in data-driven air quality forecasts. Overall, Bayesian neural networks provide a more reliable uncertainty estimate but can be challenging to implement and scale. Other scalable methods, such as deep ensemble, Monte Carlo (MC) dropout, and stochastic weight averaging-Gaussian (SWAG), can perform well if applied correctly but with different tradeoffs and slight variations in performance metrics. Finally, our results show the practical impact of uncertainty estimation and demonstrate that, indeed, probabilistic models are more suitable for making informed decisions. Code and dataset are available at url{https://github.com/Abdulmajid-Murad/deep_probabilistic_forecast}

【5】 Robust Active Learning: Sample-Efficient Training of Robust Deep Learning Models 标题:鲁棒主动学习:稳健深度学习模型的样本效率训练 链接:https://arxiv.org/abs/2112.02542

作者:Yuejun Guo,Qiang Hu,Maxime Cordy,Mike Papadakis,Yves Le Traon 机构:University of Luxembourg, Luxembourg 备注:10 pages 摘要:主动学习是一种成熟的技术,可以降低标记成本,从而建立高质量的机器学习模型。主动学习的一个核心组成部分是采集功能,用于确定应选择哪些数据进行注释。最先进的采集功能——更重要的是,主动学习技术——被设计为最大限度地提高干净的性能(例如准确性),而忽略了鲁棒性,这是一个受到越来越多关注的重要质量特性。因此,主动学习可以生成精确但不健壮的模型。在本文中,我们提出了emph{robust active learning},这是一种集成对抗性训练的主动学习过程,是产生鲁棒模型的最成熟的方法。通过对11个采集函数、4个数据集、6个DNN体系结构和15105个经过训练的DNN的实证研究,我们表明鲁棒主动学习可以生成鲁棒性(对抗性示例的准确度)在2.35%到63.85%之间的模型,而标准主动学习系统实现的鲁棒性可以忽略不计(小于0.20%). 然而,我们的研究还表明,在稳健性方面,准确度表现良好的采集函数比随机采样差。因此,我们研究了这背后的原因,并设计了一种新的采集功能,以干净的性能和鲁棒性为目标。我们的采集功能——命名为基于密度的鲁棒熵采样(DRE)——在鲁棒性方面比其他采集功能(包括随机)高出24.40%(特别是比随机采集功能高出3.84%),同时在准确性方面保持竞争力。此外,我们还证明了DRE可作为模型再训练的测试选择指标,并从所有比较函数中脱颖而出,鲁棒性高达8.21%。 摘要:Active learning is an established technique to reduce the labeling cost to build high-quality machine learning models. A core component of active learning is the acquisition function that determines which data should be selected to annotate. State-of-the-art acquisition functions -- and more largely, active learning techniques -- have been designed to maximize the clean performance (e.g. accuracy) and have disregarded robustness, an important quality property that has received increasing attention. Active learning, therefore, produces models that are accurate but not robust. In this paper, we propose emph{robust active learning}, an active learning process that integrates adversarial training -- the most established method to produce robust models. Via an empirical study on 11 acquisition functions, 4 datasets, 6 DNN architectures, and 15105 trained DNNs, we show that robust active learning can produce models with the robustness (accuracy on adversarial examples) ranging from 2.35\% to 63.85\%, whereas standard active learning systematically achieves negligible robustness (less than 0.20\%). Our study also reveals, however, that the acquisition functions that perform well on accuracy are worse than random sampling when it comes to robustness. We, therefore, examine the reasons behind this and devise a new acquisition function that targets both clean performance and robustness. Our acquisition function -- named density-based robust sampling with entropy (DRE) -- outperforms the other acquisition functions (including random) in terms of robustness by up to 24.40\% (3.84\% than random particularly), while remaining competitive on accuracy. Additionally, we prove that DRE is applicable as a test selection metric for model retraining and stands out from all compared functions by up to 8.21\% robustness.

迁移|Zero/Few/One-Shot|自适应(10篇)

【1】 Prototypical Model with Novel Information-theoretic Loss Function for Generalized Zero Shot Learning 标题:基于新信息论损失函数的广义Zero-Shot学习原型模型 链接:https://arxiv.org/abs/2112.03134

作者:Chunlin Ji,Hanchu Shen,Zhan Xiong,Feng Chen,Meiying Zhang,Huiwen Yang 机构:Shenzhen Origin AI Technology Co. Ltd, Department of Computer Science and Engineering, Southern University of Science, and Technology, Department of Electrical Engineering and Computer Sciences, University of, California, Berkeley 摘要:广义零炮学习(GZSL)仍然是深度学习的技术挑战,因为它必须在没有目标类数据的情况下识别源类和目标类。为了在仅使用源类数据进行训练时保持源类和目标类之间的语义关系,我们从信息论的角度对知识转移和语义关系进行了量化。为此,我们遵循原型模型,将关注的变量格式化为概率向量。利用所提出的概率向量表示,可以用简单的闭合形式有效地评估互信息和熵等信息度量。讨论了使用原型模型时常用嵌入空间和距离函数的选择。然后,我们提出了确定性GZSL模型的三个信息论损失函数:连接SEW数据和目标类的互信息损失;当使用SEW数据学习目标类的嵌入时,不确定性感知熵约束损失可防止过度拟合;在将语义表示映射到公共空间时,使用语义保持交叉熵损失来保持语义关系。仿真表明,作为一种确定性模型,我们提出的方法在GZSL基准数据集上获得了最新的结果。我们实现了比基线模型——深度校准网络(DCN)21%-64%的改进,并首次证明确定性模型可以和生成性模型一样发挥作用。此外,我们提出的模型与生成模型兼容。仿真研究表明,通过与f-CLSWGAN结合,我们获得了与先进生成模型相比较的结果。 摘要:Generalized zero shot learning (GZSL) is still a technical challenge of deep learning as it has to recognize both source and target classes without data from target classes. To preserve the semantic relation between source and target classes when only trained with data from source classes, we address the quantification of the knowledge transfer and semantic relation from an information-theoretic viewpoint. To this end, we follow the prototypical model and format the variables of concern as a probability vector. Leveraging on the proposed probability vector representation, the information measurement such as mutual information and entropy, can be effectively evaluated with simple closed forms. We discuss the choice of common embedding space and distance function when using the prototypical model. Then We propose three information-theoretic loss functions for deterministic GZSL model: a mutual information loss to bridge seen data and target classes; an uncertainty-aware entropy constraint loss to prevent overfitting when using seen data to learn the embedding of target classes; a semantic preserving cross entropy loss to preserve the semantic relation when mapping the semantic representations to the common space. Simulation shows that, as a deterministic model, our proposed method obtains state of the art results on GZSL benchmark datasets. We achieve 21%-64% improvements over the baseline model -- deep calibration network (DCN) and for the first time demonstrate a deterministic model can perform as well as generative ones. Moreover, our proposed model is compatible with generative models. Simulation studies show that by incorporating with f-CLSWGAN, we obtain comparable results compared with advanced generative models.

【2】 Transfer learning to improve streamflow forecasts in data sparse regions 标题:用于改进数据稀疏区域中的流量预测的转移学习 链接:https://arxiv.org/abs/2112.03088

作者:Roland Oruche,Lisa Egede,Tracy Baker,Fearghal O'Donncha 机构:Department of Electrical Engineering & Computer Science, University of Missouri–Columbia, USA, Human Computer Interaction Institute, Carnegie Mellon University, USA, The Nature Conservancy, New York, NY, USA, IBM Research Europe, IE 备注:9 pages, 5 figures, 1 table 摘要:有效的水资源管理需要在空间和时间上提供有关水资源质量和数量的信息。在本文中,我们通过微调和参数转移来研究迁移学习(TL)背后的方法,以便在数据稀疏区域中获得更好的径流预测泛化性能。我们提出了一种标准的长-短期记忆(LSTM)形式的递归神经网络,以适应足够大的源域数据集,并将学习到的权重重新调整到更小但相似的目标域数据集。我们提出了一种时空应用迁移学习方法,通过分离模型的空间和时间成分,并训练模型基于表示空间可变性的分类数据集进行泛化。该框架基于美国丰富的基准数据集开发,并基于肯尼亚自然保护协会收集的较小数据集进行评估。通过我们的TL技术,LSTM模型表现出了泛化性能。本次试验的结果表明,当使用知识转移和静态描述符改进数据稀疏区域的水文模型泛化时,预测径流响应的有效预测技巧。 摘要:Effective water resource management requires information on water availability, both in terms of quality and quantity, spatially and temporally. In this paper, we study the methodology behind Transfer Learning (TL) through fine-tuning and parameter transferring for better generalization performance of streamflow prediction in data-sparse regions. We propose a standard recurrent neural network in the form of Long Short-Term Memory (LSTM) to fit on a sufficiently large source domain dataset and repurpose the learned weights to a significantly smaller, yet similar target domain datasets. We present a methodology to implement transfer learning approaches for spatiotemporal applications by separating the spatial and temporal components of the model and training the model to generalize based on categorical datasets representing spatial variability. The framework is developed on a rich benchmark dataset from the US and evaluated on a smaller dataset collected by The Nature Conservancy in Kenya. The LSTM model exhibits generalization performance through our TL technique. Results from this current experiment demonstrate the effective predictive skill of forecasting streamflow responses when knowledge transferring and static descriptors are used to improve hydrologic model generalization in data-sparse regions.

【3】 Transfer Learning in Conversational Analysis through Reusing Preprocessing Data as Supervisors 标题:利用预处理数据作为监督者实现会话分析中的迁移学习 链接:https://arxiv.org/abs/2112.03032

作者:Joshua Yee Kim,Tongliang Liu,Kalina Yacef 机构:University of Sydney 备注:16 pages 摘要:会话分析系统使用有噪声的人类标签进行训练,在多模态特征提取过程中通常需要大量的预处理。在单任务学习中使用噪声标签会增加过度拟合的风险。在相同的训练过程中,辅助任务可以提高主要任务学习的绩效——这种方法是迁移学习和多任务学习(MTL)的交叉点。在本文中,我们将探讨如何将用于特征工程的预处理数据重新用作辅助任务,从而促进数据的生产性使用。我们的主要贡献是:(1)确定了16项有益的辅助任务,(2)研究了在主任务和辅助任务之间分配学习能力的方法,(3)研究了主任务和辅助任务之间的相对监督层次。在IEMOCAP和SEMAINE数据上的大量实验验证了与单任务方法相比的改进,并表明它可以推广到多个主要任务。 摘要:Conversational analysis systems are trained using noisy human labels and often require heavy preprocessing during multi-modal feature extraction. Using noisy labels in single-task learning increases the risk of over-fitting. Auxiliary tasks could improve the performance of the primary task learning during the same training -- this approach sits in the intersection of transfer learning and multi-task learning (MTL). In this paper, we explore how the preprocessed data used for feature engineering can be re-used as auxiliary tasks, thereby promoting the productive use of data. Our main contributions are: (1) the identification of sixteen beneficially auxiliary tasks, (2) studying the method of distributing learning capacity between the primary and auxiliary tasks, and (3) studying the relative supervision hierarchy between the primary and auxiliary tasks. Extensive experiments on IEMOCAP and SEMAINE data validate the improvements over single-task approaches, and suggest that it may generalize across multiple primary tasks.

【4】 Curriculum Meta-Learning for Few-shot Classification 标题:面向稀疏分类的课程元学习 链接:https://arxiv.org/abs/2112.02913

作者:Emmanouil Stergiadis,Priyanka Agrawal,Oliver Squire 摘要:我们建议对课程训练框架进行调整,适用于少数镜头分类的最先进元学习技术。基于课程的训练通常试图通过逐步增加训练复杂性来模仿人类学习,从而实现增量概念学习。由于元学习者的目标是学习如何从尽可能少的样本中学习,因此这些样本的确切数量(即支持集的大小)自然会反映给定任务的难度。我们定义了一个简单但新颖的课程计划,该计划从更大的支持规模开始,并在整个训练过程中逐步减少支持规模,以最终匹配测试设置所需的射击规模。该方法提高了学习效率和泛化能力。我们在两个镜头图像分类任务上使用MAML算法进行的实验表明,课程训练框架取得了显著的成效。消融研究证实了我们提出的方法与模型结构以及元学习超参数的独立性 摘要:We propose an adaptation of the curriculum training framework, applicable to state-of-the-art meta learning techniques for few-shot classification. Curriculum-based training popularly attempts to mimic human learning by progressively increasing the training complexity to enable incremental concept learning. As the meta-learner's goal is learning how to learn from as few samples as possible, the exact number of those samples (i.e. the size of the support set) arises as a natural proxy of a given task's difficulty. We define a simple yet novel curriculum schedule that begins with a larger support size and progressively reduces it throughout training to eventually match the desired shot-size of the test setup. This proposed method boosts the learning efficiency as well as the generalization capability. Our experiments with the MAML algorithm on two few-shot image classification tasks show significant gains with the curriculum training framework. Ablation studies corroborate the independence of our proposed method from the model architecture as well as the meta-learning hyperparameters

【5】 AdaSTE: An Adaptive Straight-Through Estimator to Train Binary Neural Networks 标题:AdaSTE:一种用于训练二元神经网络的自适应直通估值器 链接:https://arxiv.org/abs/2112.02880

作者:Huu Le,Rasmus Kjær Høier,Che-Tsung Lin,Christopher Zach 机构:Chalmers University of Technology, Gothenburg, Sweden 备注:18 pages 摘要:提出了一种新的二元加权深度神经网络训练算法。特别地,我们首先将二元神经网络(BiNNs)的训练问题作为一个双层优化实例,然后构造该双层规划的灵活松弛。由此产生的训练方法与几种现有的BINN训练方法,特别是成功用于BinaryConnect和后续方法的直通梯度估计器,具有相同的算法简单性。事实上,我们提出的方法可以解释为原始直通估计器的自适应变体,该估计器在误差传播的反向过程中有条件(但并非总是)起到线性映射的作用。实验结果表明,与现有算法相比,新算法具有良好的性能。 摘要:We propose a new algorithm for training deep neural networks (DNNs) with binary weights. In particular, we first cast the problem of training binary neural networks (BiNNs) as a bilevel optimization instance and subsequently construct flexible relaxations of this bilevel program. The resulting training method shares its algorithmic simplicity with several existing approaches to train BiNNs, in particular with the straight-through gradient estimator successfully employed in BinaryConnect and subsequent methods. In fact, our proposed method can be interpreted as an adaptive variant of the original straight-through estimator that conditionally (but not always) acts like a linear mapping in the backward pass of error propagation. Experimental results demonstrate that our new algorithm offers favorable performance compared to existing approaches.

【6】 A Generalized Zero-Shot Quantization of Deep Convolutional Neural Networks via Learned Weights Statistics 标题:基于学习权重统计的深卷积神经网络广义零点量化 链接:https://arxiv.org/abs/2112.02834

作者:Prasen Kumar Sharma,Arun Abraham,Vikram Nelvoy Rajendiran 机构:. 备注:Accepted by IEEE Transactions on Multimedia 摘要:将浮点权重和深度卷积神经网络的激活量化为定点表示可以减少内存占用和推理时间。最近,人们正在努力实现Zero-Shot量化,这种量化不需要给定任务的原始未标记训练样本。这些发表得最好的作品严重依赖于学习的批量归一化(BN)参数来推断量化激活的范围。特别是,这些方法建立在经验估计框架或数据提取方法的基础上,用于计算激活范围。然而,当使用不容纳BN层的网络时,此类方案的性能严重下降。在这种思路下,我们提出了一种既不需要原始数据也不依赖于BN层统计的广义零拍量化(GZSQ)框架。我们使用了数据提取方法,仅利用模型的预训练权重来估计激活范围校准的丰富数据。据我们所知,这是第一个利用预训练权重分布来辅助零拍量化过程的工作。对于各种任务,拟议方案的性能明显优于现有的零炮工作,例如,MobileNet V2和其他几种w&w/o BN层模型的分类精度提高了约33%。我们还展示了所提出的工作在多个开源量化框架中的有效性。重要的是,我们的工作是对未来非规范化深度神经网络的训练后零炮量化的首次尝试。 摘要:Quantizing the floating-point weights and activations of deep convolutional neural networks to fixed-point representation yields reduced memory footprints and inference time. Recently, efforts have been afoot towards zero-shot quantization that does not require original unlabelled training samples of a given task. These best-published works heavily rely on the learned batch normalization (BN) parameters to infer the range of the activations for quantization. In particular, these methods are built upon either empirical estimation framework or the data distillation approach, for computing the range of the activations. However, the performance of such schemes severely degrades when presented with a network that does not accommodate BN layers. In this line of thought, we propose a generalized zero-shot quantization (GZSQ) framework that neither requires original data nor relies on BN layer statistics. We have utilized the data distillation approach and leveraged only the pre-trained weights of the model to estimate enriched data for range calibration of the activations. To the best of our knowledge, this is the first work that utilizes the distribution of the pretrained weights to assist the process of zero-shot quantization. The proposed scheme has significantly outperformed the existing zero-shot works, e.g., an improvement of ~ 33% in classification accuracy for MobileNetV2 and several other models that are w & w/o BN layers, for a variety of tasks. We have also demonstrated the efficacy of the proposed work across multiple open-source quantization frameworks. Importantly, our work is the first attempt towards the post-training zero-shot quantization of futuristic unnormalized deep neural networks.

【7】 End-to-end Adaptive Distributed Training on PaddlePaddle 标题:基于PaddlePaddle的端到端自适应分布式训练 链接:https://arxiv.org/abs/2112.02752

作者:Yulong Ao,Zhihua Wu,Dianhai Yu,Weibao Gong,Zhiqing Kui,Minxu Zhang,Zilingfeng Ye,Liang Shen,Yanjun Ma,Tian Wu,Haifeng Wang,Wei Zeng,Chao Yang 机构: Baidu Inc., Peking University 备注:16 pages, 10 figures, 4 tables 摘要:分布式训练已成为训练处理海量数据的大型神经网络(NN)模型的一种普遍而有效的方法。然而,要满足不同的神经网络模型、不同的计算资源及其在训练工作中的动态变化的需求是非常具有挑战性的。在本研究中,我们以系统的端到端视图设计了分布式训练框架,通过充分考虑资源分配、模型划分、任务布置和分布式执行,为不同场景(尤其是工业应用和生产环境)提供内置的自适应能力。基于统一分布图和统一集群对象,我们的自适应框架配备了全局代价模型和全局规划器,可以实现任意并行、资源感知布局、多模式执行、容错和弹性分布式训练。实验表明,该框架能够满足应用程序的多样性和资源的异构性等方面的需求,具有很强的竞争力。具有2600亿个参数的ERNIE语言模型在数千个AI处理器上得到有效训练,可扩展性差91.7%。通过采用异构流水线异步执行,来自推荐系统的模型吞吐量可分别提高到仅GPU和仅CPU训练的2.1倍和3.3倍。此外,容错和弹性分布式训练已成功应用于在线工业应用中,使长期训练作业失败的数量减少了34.49%,生产环境中的全局调度效率提高了33.91%。 摘要:Distributed training has become a pervasive and effective approach for training a large neural network (NN) model with processing massive data. However, it is very challenging to satisfy requirements from various NN models, diverse computing resources, and their dynamic changes during a training job. In this study, we design our distributed training framework in a systematic end-to-end view to provide the built-in adaptive ability for different scenarios, especially for industrial applications and production environments, by fully considering resource allocation, model partition, task placement, and distributed execution. Based on the unified distributed graph and the unified cluster object, our adaptive framework is equipped with a global cost model and a global planner, which can enable arbitrary parallelism, resource-aware placement, multi-mode execution, fault-tolerant, and elastic distributed training. The experiments demonstrate that our framework can satisfy various requirements from the diversity of applications and the heterogeneity of resources with highly competitive performance. The ERNIE language model with 260 billion parameters is efficiently trained on thousands of AI processors with 91.7% weak scalability. The throughput of the model from the recommender system by employing the heterogeneous pipeline asynchronous execution can be increased up to 2.1 times and 3.3 times that of the GPU-only and CPU-only training respectively. Moreover, the fault-tolerant and elastic distributed training have been successfully applied to the online industrial applications, which give a reduction of 34.49% in the number of failed long-term training jobs and an increase of 33.91% for the global scheduling efficiency in the production environment.

【8】 TransBoost: A Boosting-Tree Kernel Transfer Learning Algorithm for Improving Financial Inclusion 标题:TransBoost:一种改进金融包容性的Boost-Tree核转移学习算法 链接:https://arxiv.org/abs/2112.02365

作者:Yiheng Sun,Tian Lu,Cong Wang,Yuan Li,Huaiyu Fu,Jingran Dong,Yunjie Xu 机构: Tencent Weixin Group, Heinz College, Carnegie Mellon University, Guanghua School of Management, Peking University, School of Management, Fudan University 摘要:移动和金融技术的繁荣孕育了各种金融产品,并将其扩展到更广泛的人群,这有助于倡导金融包容。它具有减少金融不平等的非同寻常的社会效益。然而,由于新用户的显著特征分布和有限的信用历史,以及新进入的公司在处理复杂数据和获得准确标签方面缺乏经验,个人金融风险评估面临技术挑战,阻碍了进一步促进金融包容。为了应对这些挑战,本文结合基于树的模型和核方法的优点,提出了一种新的迁移学习算法(TransBoost)。TransBoost采用了并行树结构和有效的权值更新机制,在理论上有保证,这使得它能够在处理具有高维特征和稀疏性的真实世界数据时表现出色,时间复杂度为$O(n)$。我们在腾讯移动支付的两个公共数据集和一个独特的大规模数据集上进行了广泛的实验。结果表明,TransBoost在预测精度和效率方面优于其他最先进的基准转移学习算法,对数据稀疏性具有更强的鲁棒性,并提供有意义的模型解释。此外,鉴于金融风险水平,TransBoost使金融服务提供商能够为最大数量的用户提供服务,包括那些可能被其他算法排除在外的用户。也就是说,TransBoost改善了金融包容性。 摘要:The prosperity of mobile and financial technologies has bred and expanded various kinds of financial products to a broader scope of people, which contributes to advocating financial inclusion. It has non-trivial social benefits of diminishing financial inequality. However, the technical challenges in individual financial risk evaluation caused by the distinct characteristic distribution and limited credit history of new users, as well as the inexperience of newly-entered companies in handling complex data and obtaining accurate labels, impede further promoting financial inclusion. To tackle these challenges, this paper develops a novel transfer learning algorithm (i.e., TransBoost) that combines the merits of tree-based models and kernel methods. The TransBoost is designed with a parallel tree structure and efficient weights updating mechanism with theoretical guarantee, which enables it to excel in tackling real-world data with high dimensional features and sparsity in $O(n)$ time complexity. We conduct extensive experiments on two public datasets and a unique large-scale dataset from Tencent Mobile Payment. The results show that the TransBoost outperforms other state-of-the-art benchmark transfer learning algorithms in terms of prediction accuracy with superior efficiency, shows stronger robustness to data sparsity, and provides meaningful model interpretation. Besides, given a financial risk level, the TransBoost enables financial service providers to serve the largest number of users including those who would otherwise be excluded by other algorithms. That is, the TransBoost improves financial inclusion.

【9】 Adaptive label thresholding methods for online multi-label classification 标题:一种在线多标签分类的自适应标签阈值方法 链接:https://arxiv.org/abs/2112.02301

作者:Tingting Zhai,Hongcheng Tang,Hao Wang 机构:College of Information Engineering, Yangzhou University, Yangzhou, China, School of Computer and Software, Nanjing University of Information Science and, Technology (NUIST), China 备注:31 pages, 2 figures 摘要:现有的在线多标签分类算法不能很好地处理在线标签阈值问题,对其在线算法缺乏遗憾分析。本文提出了一种新的在线多标签分类自适应标签阈值算法框架,旨在克服现有方法的缺点。该框架的主要特点是,评分和阈值模型都是在线多标签分类器的重要组成部分,并被合并到一个在线优化问题中。此外,为了建立评分和阈值模型之间的关系,推导了一种新的多标签分类损失函数,该函数度量多标签分类器在多大程度上能够区分传入实例的相关标签和无关标签。基于这个新的框架和损失函数,我们提出了一阶线性算法和二阶线性算法,这两种算法都有封闭形式的更新,但需要使用不同的技术来更新多标签分类器。这两种算法都被证明可以实现一个次线性的遗憾。使用Mercer核,我们的一阶算法已经扩展到处理非线性多标签预测任务。实验表明,我们的线性和非线性算法在各种多标签性能指标方面具有优势。 摘要:Existing online multi-label classification works cannot well handle the online label thresholding problem and lack the regret analysis for their online algorithms. This paper proposes a novel framework of adaptive label thresholding algorithms for online multi-label classification, with the aim to overcome the drawbacks of existing methods. The key feature of our framework is that both scoring and thresholding models are included as important components of the online multi-label classifier and are incorporated into one online optimization problem. Further, in order to establish the relationship between scoring and thresholding models, a novel multi-label classification loss function is derived, which measures to what an extent the multi-label classifier can distinguish between relevant labels and irrelevant ones for an incoming instance. Based on this new framework and loss function, we present a first-order linear algorithm and a second-order one, which both enjoy closed form update, but rely on different techniques for updating the multi-label classifier. Both algorithms are proved to achieve a sub-linear regret. Using Mercer kernels, our first-order algorithm has been extended to deal with nonlinear multi-label prediction tasks. Experiments show the advantage of our linear and nonlinear algorithms, in terms of various multi-label performance metrics.

【10】 Distributed Adaptive Learning Under Communication Constraints 标题:通信约束下的分布式自适应学习 链接:https://arxiv.org/abs/2112.02129

作者:Marco Carpentiero,Vincenzo Matta,Ali H. Sayed 机构: and also with the National Inter-UniversityConsortium for Telecommunications (CNIT) 备注:Submitted for publication 摘要:这项工作考察了设计用于在通信约束下运行的自适应分布式学习策略。我们考虑一个代理网络,必须解决连续优化的流数据的在线优化问题。代理实现了一种分布式协作策略,允许每个代理与其邻居进行本地信息交换。为了应对通信限制,交换的信息必须不可避免地被压缩。我们提出了一种昵称为ACTC(Adapt-Compress-Then-Combine)的扩散策略,该策略依赖于以下步骤:i)一个自适应步骤,其中每个代理以恒定的步长执行单个随机梯度更新;ii)利用最近引入的一类随机压缩算子的压缩步骤;以及iii)组合步骤,其中每个代理组合从其邻居接收的压缩更新。这项工作的特点如下。首先,我们关注自适应策略,其中恒定(而不是减小)步长对于实时响应非平稳变化至关重要。其次,我们考虑有向图和左随机组合策略的一般类,这使得我们能够增强拓扑和学习之间的相互作用。第三,与假设所有个体代理的成本函数均为强凸性的相关工作相反,我们只要求在网络级别上具有强凸性,即使单个代理具有强凸性成本,而其余代理具有非凸性成本,也满足该条件。第四,我们注重传播(而不是共识)战略。在压缩信息的要求设置下,我们确定ACTC迭代在所需优化器周围波动,在相邻代理之间交换比特方面实现显著节约。 摘要:This work examines adaptive distributed learning strategies designed to operate under communication constraints. We consider a network of agents that must solve an online optimization problem from continual observation of streaming data. The agents implement a distributed cooperative strategy where each agent is allowed to perform local exchange of information with its neighbors. In order to cope with communication constraints, the exchanged information must be unavoidably compressed. We propose a diffusion strategy nicknamed as ACTC (Adapt-Compress-Then-Combine), which relies on the following steps: i) an adaptation step where each agent performs an individual stochastic-gradient update with constant step-size; ii) a compression step that leverages a recently introduced class of stochastic compression operators; and iii) a combination step where each agent combines the compressed updates received from its neighbors. The distinguishing elements of this work are as follows. First, we focus on adaptive strategies, where constant (as opposed to diminishing) step-sizes are critical to respond in real time to nonstationary variations. Second, we consider the general class of directed graphs and left-stochastic combination policies, which allow us to enhance the interplay between topology and learning. Third, in contrast with related works that assume strong convexity for all individual agents' cost functions, we require strong convexity only at a network level, a condition satisfied even if a single agent has a strongly-convex cost and the remaining agents have non-convex costs. Fourth, we focus on a diffusion (as opposed to consensus) strategy. Under the demanding setting of compressed information, we establish that the ACTC iterates fluctuate around the desired optimizer, achieving remarkable savings in terms of bits exchanged between neighboring agents.

强化学习(5篇)

【1】 Functional Regularization for Reinforcement Learning via Learned Fourier Features 标题:基于学习傅立叶特征的强化学习函数正则化 链接:https://arxiv.org/abs/2112.03257

作者:Alexander C. Li,Deepak Pathak 机构:Carnegie Mellon University 备注:Accepted at NeurIPS 2021. Website at this https URL 摘要:我们提出了一种简单的深度强化学习体系结构,通过将输入嵌入到学习的傅里叶基中,并表明它提高了基于状态和基于图像的RL的采样效率。我们使用神经切线核对我们的体系结构进行了无限宽度分析,并从理论上表明,调整傅里叶基的初始方差相当于对学习的深层网络进行函数正则化。也就是说,这些学习到的傅立叶特征允许调整网络在训练数据中的欠拟合或过拟合不同频率的程度,并因此提供一种受控机制来改进RL优化的稳定性和性能。从经验上讲,这使我们能够优先学习低频函数,并通过在优化过程中降低网络对噪声的敏感性来加快学习速度,例如在贝尔曼更新期间。在标准的基于状态和基于图像的RL基准测试上的实验表明,我们的体系结构明显优于基线。网址:https://alexanderli.com/learned-fourier-features 摘要:We propose a simple architecture for deep reinforcement learning by embedding inputs into a learned Fourier basis and show that it improves the sample efficiency of both state-based and image-based RL. We perform infinite-width analysis of our architecture using the Neural Tangent Kernel and theoretically show that tuning the initial variance of the Fourier basis is equivalent to functional regularization of the learned deep network. That is, these learned Fourier features allow for adjusting the degree to which networks underfit or overfit different frequencies in the training data, and hence provide a controlled mechanism to improve the stability and performance of RL optimization. Empirically, this allows us to prioritize learning low-frequency functions and speed up learning by reducing networks' susceptibility to noise in the optimization process, such as during Bellman updates. Experiments on standard state-based and image-based RL benchmarks show clear benefits of our architecture over the baselines. Website at https://alexanderli.com/learned-fourier-features

【2】 Hierarchical Reinforcement Learning with Timed Subgoals 标题:具有定时子目标的分层强化学习 链接:https://arxiv.org/abs/2112.03100

作者:Nico Gürtler,Dieter Büchler,Georg Martius 机构:Max Planck Institute for Intelligent Systems, Tübingen, Germany 备注:Published at NeurIPS 2021. Code available at this https URL 摘要:分层强化学习(HRL)在具有挑战性的长视距任务中具有极大的样本有效学习潜力。特别是,让较高级别的人将子目标分配给较低级别的人,可以快速解决难题。然而,这种基于子目标的方法是在考虑静态强化学习环境的情况下设计的,因此,即使在现实世界的问题中普遍存在动态元素,它们也会与agent无法直接控制的动态元素进行斗争。在本文中,我们介绍了带有时间子目标(HiTS)的分层强化学习,这是一种HRL算法,它使代理能够通过不仅指定要达到的目标状态,而且还指定何时达到目标状态来调整其时间以适应动态环境。我们讨论了如何与较低级别的人就这些定时子目标进行沟通,从而为较高级别的人带来更稳定的学习问题。我们在一系列标准基准测试和三个新的具有挑战性的动态强化学习环境上的实验表明,我们的方法能够在现有的基于子目标的HRL方法无法学习稳定解的情况下进行样本有效学习。 摘要:Hierarchical reinforcement learning (HRL) holds great potential for sample-efficient learning on challenging long-horizon tasks. In particular, letting a higher level assign subgoals to a lower level has been shown to enable fast learning on difficult problems. However, such subgoal-based methods have been designed with static reinforcement learning environments in mind and consequently struggle with dynamic elements beyond the immediate control of the agent even though they are ubiquitous in real-world problems. In this paper, we introduce Hierarchical reinforcement learning with Timed Subgoals (HiTS), an HRL algorithm that enables the agent to adapt its timing to a dynamic environment by not only specifying what goal state is to be reached but also when. We discuss how communicating with a lower level in terms of such timed subgoals results in a more stable learning problem for the higher level. Our experiments on a range of standard benchmarks and three new challenging dynamic reinforcement learning environments show that our method is capable of sample-efficient learning where an existing state-of-the-art subgoal-based HRL method fails to learn stable solutions.

【3】 Benchmark for Out-of-Distribution Detection in Deep Reinforcement Learning 标题:深度强化学习中非分布检测的基准测试 链接:https://arxiv.org/abs/2112.02694

作者:Aaqib Parvez Mohammed,Matias Valdenegro-Toro 机构: Bonn-Rhein-Sieg University of Applied Sciences, Germany., German Research Center for Artificial Intelligence, Bremen, Germany. 备注:9 pages, 5 figures, 5 tables, Bayesian Deep Learning Workshop @ NeurIPS 2021 摘要:基于强化学习(RL)的解决方案被广泛应用于机器人、医疗保健和工业自动化等领域。最关注的是这些解决方案何时能很好地工作,但当它们出现分布外的输入时就会失败。RL策略与大多数机器学习模型具有相同的缺陷。文献中通常没有很好地介绍RL的分布外检测,并且缺乏这项任务的基准。在这项工作中,我们提出了一个基准来评估强化学习环境中的OOD检测方法,通过修改非视觉标准环境的物理参数或破坏视觉环境的状态观测。我们讨论了生成能够生成OOD数据的定制RL环境的方法,并评估了用于OOD检测任务的三种不确定性方法。我们的结果表明,集成方法在多个环境中具有较低的标准差,具有最佳的OOD检测性能。 摘要:Reinforcement Learning (RL) based solutions are being adopted in a variety of domains including robotics, health care and industrial automation. Most focus is given to when these solutions work well, but they fail when presented with out of distribution inputs. RL policies share the same faults as most machine learning models. Out of distribution detection for RL is generally not well covered in the literature, and there is a lack of benchmarks for this task. In this work we propose a benchmark to evaluate OOD detection methods in a Reinforcement Learning setting, by modifying the physical parameters of non-visual standard environments or corrupting the state observation for visual environments. We discuss ways to generate custom RL environments that can produce OOD data, and evaluate three uncertainty methods for the OOD detection task. Our results show that ensemble methods have the best OOD detection performance with a lower standard deviation across multiple environments.

【4】 Math Programming based Reinforcement Learning for Multi-Echelon Inventory Management 标题:基于数学规划的强化学习在多级库存管理中的应用 链接:https://arxiv.org/abs/2112.02215

作者:Pavithra Harsha,Ashish Jagmohan,Jayant R. Kalagnanam,Brian Quanz,Divya Singhvi 机构: IBM Research, Thomas J. Watson Research Center, Yorktown Heights, NY , USA, Stern School of Business, New York University, New York, NY , USA 备注:Accepted to NeurIPS 2021 Deep RL Workshop. Authors are listed in alphabetical order 摘要:强化学习在机器人、游戏和许多其他领域取得了重大突破。但是,在复杂的现实决策问题中,RL的应用仍然有限。运营管理中的许多问题(例如库存和收入管理)的特点是大行动空间和随机系统动力学。这些特点使得现有的RL方法很难解决问题,这些方法依赖枚举技术来解决每一步动作问题。为了解决这些问题,我们开发了可编程参与者强化学习(PARL),这是一种策略迭代方法,使用整数规划和样本平均近似的技术。通过分析,我们证明了对于给定的批评家,当潜在的不确定性样本趋于无穷大时,每次迭代中学习的策略收敛到最优策略。事实上,我们证明了对潜在的不确定性分布进行适当选择的离散化可以产生接近最优的参与者策略,即使来自潜在不确定性的样本很少。然后,我们将我们的算法应用于具有复杂供应链结构的实际库存管理问题,并表明PARL在这些设置下优于最先进的RL和库存优化方法。我们发现,在不同的供应链环境中,PARL平均比常用的基本库存启发式方法高出44.7%,而表现最好的RL方法平均高达12.1%。 摘要:Reinforcement learning has lead to considerable break-throughs in diverse areas such as robotics, games and many others. But the application to RL in complex real-world decision making problems remains limited. Many problems in operations management (inventory and revenue management, for example) are characterized by large action spaces and stochastic system dynamics. These characteristics make the problem considerably harder to solve for existing RL methods that rely on enumeration techniques to solve per step action problems. To resolve these issues, we develop Programmable Actor Reinforcement Learning (PARL), a policy iteration method that uses techniques from integer programming and sample average approximation. Analytically, we show that the for a given critic, the learned policy in each iteration converges to the optimal policy as the underlying samples of the uncertainty go to infinity. Practically, we show that a properly selected discretization of the underlying uncertain distribution can yield near optimal actor policy even with very few samples from the underlying uncertainty. We then apply our algorithm to real-world inventory management problems with complex supply chain structures and show that PARL outperforms state-of-the-art RL and inventory optimization methods in these settings. We find that PARL outperforms commonly used base stock heuristic by 44.7% and the best performing RL method by up to 12.1% on average across different supply chain environments.

【5】 Intelligent Trading Systems: A Sentiment-Aware Reinforcement Learning Approach 标题:智能交易系统:一种情感感知强化学习方法 链接:https://arxiv.org/abs/2112.02095

作者:Francisco Caio Lima Paiva,Leonardo Kanashiro Felizardo,Reinaldo Augusto da Costa Bianchi,Anna Helena Reali Costa 机构:Universidade de São Paulo, São Paulo, SP, Brazil, Centro Universitário FEI, São Bernardo do Campo, SP, Brazil 备注:9 pages, 5 figures, To appear in the Proceedings of the 2nd ACM International Conference on AI in Finance (ICAIF'21), November 3-5, 2021, Virtual Event, USA 摘要:基于模式识别在证券交易所对单一资产进行有利可图交易的可行性一直吸引着研究人员。强化学习(RL)和自然语言处理在这些单一资产交易任务中已经声名狼藉,但只有少数作品探索了它们的结合。此外,一些问题仍未得到解决,例如通过明确捕获反映市场状况的情绪特征来提取市场情绪动量,以及评估不同情况下RL结果的一致性和稳定性。填补这一空白,我们提出了情绪感知RL(SentARL)智能交易系统,该系统通过从文本新闻中提取自适应数量的过去情绪特征,利用市场情绪,提高利润稳定性。我们评估了20项资产、两项交易成本和五个不同时期的SentARL,并对其进行了初始化,以显示其与基线的一致有效性。随后,这一彻底的评估使我们能够确定新闻报道和市场情绪之间的界限,即价格时间序列的相关性,在该相关性之上,SentARL的有效性是突出的。 摘要:The feasibility of making profitable trades on a single asset on stock exchanges based on patterns identification has long attracted researchers. Reinforcement Learning (RL) and Natural Language Processing have gained notoriety in these single-asset trading tasks, but only a few works have explored their combination. Moreover, some issues are still not addressed, such as extracting market sentiment momentum through the explicit capture of sentiment features that reflect the market condition over time and assessing the consistency and stability of RL results in different situations. Filling this gap, we propose the Sentiment-Aware RL (SentARL) intelligent trading system that improves profit stability by leveraging market mood through an adaptive amount of past sentiment features drawn from textual news. We evaluated SentARL across twenty assets, two transaction costs, and five different periods and initializations to show its consistent effectiveness against baselines. Subsequently, this thorough assessment allowed us to identify the boundary between news coverage and market sentiment regarding the correlation of price-time series above which SentARL's effectiveness is outstanding.

医学相关(4篇)

【1】 Joint Learning of Localized Representations from Medical Images and Reports 标题:医学图像和报告本地化表征的联合学习 链接:https://arxiv.org/abs/2112.02889

作者:Philip Müller,Georgios Kaissis,Congyu Zou,Daniel Rückert 机构: Institute for Artificial Intelligence and Informatics in Medicine, Department of Informatics, Institute of Diagnostic and Interventional Radiology, Technical University of Munich, Department of Computing, Imperial College London 备注:14 pages, 3 figures, 2 tables 摘要:对比学习已被证明对未标记数据的预训练图像模型是有效的,对于医学图像分类等任务具有良好的效果。在训练前使用成对的文本和图像(如放射报告和图像)进一步改善了结果。尽管如此,大多数现有的方法将图像分类作为下游任务,对于语义分割或对象检测等局部任务可能不是最优的。因此,据我们所知,我们提出了基于视觉和文本的局部表征学习(LoVT),这是第一种针对局部医学成像任务的文本监督预训练方法。该方法将实例级的图像-报表对比学习与图像区域和报表语句表示的局部对比学习相结合。我们在一个新的评估框架上评估LoVT和常用的预训练方法,该框架由来自五个公共数据集的18个胸部X光局部任务组成。虽然没有单一的最佳方法,但LoVT在18项研究任务中的11项表现最好,因此它是本地化任务的首选方法。 摘要:Contrastive learning has proven effective for pre-training image models on unlabeled data with promising results for tasks such as medical image classification. Using paired text and images (such as radiological reports and images) during pre-training improved the results even further. Still, most existing methods target image classification as downstream tasks and may not be optimal for localized tasks like semantic segmentation or object detection. We therefore propose Localized representation learning from Vision and Text (LoVT), to our best knowledge, the first text-supervised pre-training method that targets localized medical imaging tasks. Our method combines instance-level image-report contrastive learning with local contrastive learning on image region and report sentence representations. We evaluate LoVT and commonly used pre-training methods on a novel evaluation framework consisting of 18 localized tasks on chest X-rays from five public datasets. While there is no single best method, LoVT performs best on 11 out of the 18 studied tasks making it the preferred method of choice for localized tasks.

【2】 Separated Contrastive Learning for Organ-at-Risk and Gross-Tumor-Volume Segmentation with Limited Annotation 标题:基于有限标注的高危器官和肿瘤大体分割的分离对比学习 链接:https://arxiv.org/abs/2112.02743

作者:Jiacheng Wang,Xiaomeng Li,Yiming Han,Jing Qin,Liansheng Wang,Qichao Zhou 机构: Department of Computer Science at School of Informatics, Xiamen University, Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology 备注:Accepted in AAAI-22 摘要:自动划定危险器官(OAR)和肿瘤总体积(GTV)对于放射治疗计划具有重要意义。然而,在有限的像素(体素)注释下学习功能强大的表示法以进行精确描绘是一项具有挑战性的任务。像素级的对比学习可以通过从未标记数据中学习密集表示来减轻对注释的依赖。这方面的最新研究在特征图上设计了各种对比损失,以产生地图中每个像素的鉴别特征。然而,同一地图中的像素不可避免地共享语义,使其比实际更接近,这可能会影响同一地图中像素的区分,并导致与其他地图中像素的不公平比较。为了解决这些问题,我们提出了一种分离的区域级对比学习方案,即SepaReg,其核心是将每个图像分割成多个区域,并分别对每个区域进行编码。具体而言,SepaReg包括两个组件:结构感知图像分离(SIS)模块和器官内和器官间蒸馏(IID)模块。SIS在结构信息的指导下对图像集进行操作,重建区域集。器官间表征将通过典型的跨区域对比学习。另一方面,IID被提议通过利用器官内表征来解决区域集合中的数量不平衡问题,因为微小的器官可能产生较少的区域。我们在一个公共数据集和两个私有数据集上进行了大量实验来评估所提出的模型。实验结果证明了该模型的有效性,其性能始终优于现有的方法。代码可在https://github.com/jcwang123/Separate_CL. 摘要:Automatic delineation of organ-at-risk (OAR) and gross-tumor-volume (GTV) is of great significance for radiotherapy planning. However, it is a challenging task to learn powerful representations for accurate delineation under limited pixel (voxel)-wise annotations. Contrastive learning at pixel-level can alleviate the dependency on annotations by learning dense representations from unlabeled data. Recent studies in this direction design various contrastive losses on the feature maps, to yield discriminative features for each pixel in the map. However, pixels in the same map inevitably share semantics to be closer than they actually are, which may affect the discrimination of pixels in the same map and lead to the unfair comparison to pixels in other maps. To address these issues, we propose a separated region-level contrastive learning scheme, namely SepaReg, the core of which is to separate each image into regions and encode each region separately. Specifically, SepaReg comprises two components: a structure-aware image separation (SIS) module and an intra- and inter-organ distillation (IID) module. The SIS is proposed to operate on the image set to rebuild a region set under the guidance of structural information. The inter-organ representation will be learned from this set via typical contrastive losses cross regions. On the other hand, the IID is proposed to tackle the quantity imbalance in the region set as tiny organs may produce fewer regions, by exploiting intra-organ representations. We conducted extensive experiments to evaluate the proposed model on a public dataset and two private datasets. The experimental results demonstrate the effectiveness of the proposed model, consistently achieving better performance than state-of-the-art approaches. Code is available at https://github.com/jcwang123/Separate_CL.

【3】 Real-time Virtual Intraoperative CT for Image Guided Surgery 标题:实时虚拟CT在图像引导手术中的应用 链接:https://arxiv.org/abs/2112.02608

作者:Yangming Li,Neeraja Konuthula,Ian M. Humphreys,Kris Moe,Blake Hannaford,Randall Bly 机构:Rochester Institute of Technology, RoCALab, Rochester, USA, University of Washington, Department of Otolaryngology–Head and Neck Surgery, Seattle, USA, University of Washington, BioRobotics Lab, Seattle, USA, Seattle Children’s Hospital, Seattle, USA 摘要:摘要目的:本文提出一种在鼻内窥镜手术(ESS)中生成虚拟术中CT扫描的方案,以提高手术的完整性。方法:该工作提出了三种方法,基于针尖运动、基于针尖轨迹和基于仪器,以及非参数平滑和高斯过程回归,用于虚拟术中CT生成。结果:研究并比较了在尸体上进行ESS的方法。手术结果表明,这三种方法提高了骰子相似系数>86%,F评分>92%,精确度>89.91%。基于尖端轨迹的方法被发现具有最佳性能,在手术完整性评估中达到96.87%的精度。结论:这项工作表明,虚拟术中CT扫描提高了实际手术场景与参考模型之间的一致性,并提高了ESS手术的完整性。与实际的术中CT扫描相比,该方案不影响现有的手术方案,不需要除大多数ESS中已有的硬件之外的额外硬件,克服了实际术中CT造成的高成本、重复辐射和延长麻醉时间,在ESS中是实用的。 摘要:Abstract. Purpose: This paper presents a scheme for generating virtual intraoperative CT scans in order to improve surgical completeness in Endoscopic Sinus Surgeries (ESS). Approach: The work presents three methods, the tip motion-based, the tip trajectory-based, and the instrument based, along with non-parametric smoothing and Gaussian Process Regression, for virtual intraoperative CT generation. Results: The proposed methods studied and compared on ESS performed on cadavers. Surgical results show all three methods improve the Dice Similarity Coefficients > 86%, with F-score > 92% and precision > 89.91%. The tip trajectory-based method was found to have best performance and reached 96.87% precision in surgical completeness evaluation. Conclusions: This work demonstrated that virtual intraoperative CT scans improves the consistency between the actual surgical scene and the reference model, and improves surgical completeness in ESS. Comparing with actual intraoperative CT scans, the proposed scheme has no impact on existing surgical protocols, does not require extra hardware other than the one is already available in most ESS overcome the high costs, the repeated radiation, and the elongated anesthesia caused by actual intraoperative CTs, and is practical in ESS.

【4】 Classification of COVID-19 on chest X-Ray images using Deep Learning model with Histogram Equalization and Lungs Segmentation 标题:基于直方图均衡化和肺部分割的深度学习模型在胸片冠状病毒分类中的应用 链接:https://arxiv.org/abs/2112.02478

作者:Hitendra Singh Bhadouria,Krishan Kumar,Aman Swaraj,Karan Verma,Arshpreet Kaur,Shasvat Sharma,Ghanshyam Singh,Ashok Kumar,Leandro Melo de Sales 机构:National Institute of Technology, Delhi,; Indian Institute of Technology, Roorkee,; DIT University, Dehradun, Malaviya National Institute of Technology Jaipur,; Government Mahila Engineering, College, Ajmer,; Universidade Federal De Alagoas-UFAL, Brasil 备注:Total number of words of the manuscript- 6577 The number of words of the abstract- 238 The number of figures- 8 The number of tables- 10 摘要:背景与目的:人工智能(AI)2019冠状病毒疾病在巴西和印度等人口密集、检测工具不足的国家,辐射成像可以作为重要的诊断手段。为准确2019冠状病毒疾病2019冠状病毒疾病的分类,及时提供必要的治疗。在此基础上,我们提出了一种基于深度学习结构的COVID-19感染肺部X光胸片的研究方法。肺炎2019冠状病毒疾病2019冠状病毒疾病,其中470个X射线图像属于COVID-19类。方法:我们首先使用直方图均衡技术对所有图像进行预处理,然后使用U-NETS结构对其进行分割。然后,使用VGG-16网络对预处理图像进行特征提取,再由SMOTE进一步采样。最后,利用支持向量机(SVM)对类平衡特征进行分类结果表明:结合2019冠状病毒疾病的图像处理方法,我们采用了已知的预处理技术、特征提取方法和数据集平衡方法,在2470个X射线图像数据集上,对COVID-19图像的识别率达到98%。fore-fit可用于医疗机构的筛查目的。 摘要:Background and Objective: Artificial intelligence (AI) methods coupled with biomedical analysis has a critical role during pandemics as it helps to release the overwhelming pressure from healthcare systems and physicians. As the ongoing COVID-19 crisis worsens in countries having dense populations and inadequate testing kits like Brazil and India, radiological imaging can act as an important diagnostic tool to accurately classify covid-19 patients and prescribe the necessary treatment in due time. With this motivation, we present our study based on deep learning architecture for detecting covid-19 infected lungs using chest X-rays. Dataset: We collected a total of 2470 images for three different class labels, namely, healthy lungs, ordinary pneumonia, and covid-19 infected pneumonia, out of which 470 X-ray images belong to the covid-19 category. Methods: We first pre-process all the images using histogram equalization techniques and segment them using U-net architecture. VGG-16 network is then used for feature extraction from the pre-processed images which is further sampled by SMOTE oversampling technique to achieve a balanced dataset. Finally, the class-balanced features are classified using a support vector machine (SVM) classifier with 10-fold cross-validation and the accuracy is evaluated. Result and Conclusion: Our novel approach combining well-known pre-processing techniques, feature extraction methods, and dataset balancing method, lead us to an outstanding rate of recognition of 98% for COVID-19 images over a dataset of 2470 X-ray images. Our model is therefore fit to be utilized in healthcare facilities for screening purposes.

蒸馏|知识提取(4篇)

【1】 Does constituency analysis enhance domain-specific pre-trained BERT models for relation extraction? 标题:成分分析是否增强了用于关系提取的特定领域预先训练的BERT模型? 链接:https://arxiv.org/abs/2112.02955

作者:Anfu Tang,Louise Deléger,Robert Bossy,Pierre Zweigenbaum,Claire Nédellec 机构:Université Paris-Saclay, INRAE, MaIAGE, Jouy-en-Josas, France, Université Paris-Saclay, CNRS, Laboratoire interdisciplinaire des sciences du numérique, Orsay, France 备注:None 摘要:近年来,人们对关系抽取进行了大量的研究。BioCreative VII的DrugProt track提供了一个手动注释的语料库,用于开发和评估关系提取系统,其中研究了化学品和基因之间的相互作用。我们描述了提交文件时使用的集合系统,该系统通过多数投票将微调bioBERT、sciBERT和const bioBERT模型的预测结合起来。我们用BERT测试了句法信息对关系提取的贡献。我们观察到,将基于成分的句法信息添加到BERT中提高了精确度,但降低了召回率,因为在序列集中很少看到的关系不太可能被注入句法信息的BERT模型预测。我们的代码可以在线获取[https://github.com/Maple177/drugprot-relation-extraction]. 摘要:Recently many studies have been conducted on the topic of relation extraction. The DrugProt track at BioCreative VII provides a manually-annotated corpus for the purpose of the development and evaluation of relation extraction systems, in which interactions between chemicals and genes are studied. We describe the ensemble system that we used for our submission, which combines predictions of fine-tuned bioBERT, sciBERT and const-bioBERT models by majority voting. We specifically tested the contribution of syntactic information to relation extraction with BERT. We observed that adding constituentbased syntactic information to BERT improved precision, but decreased recall, since relations rarely seen in the train set were less likely to be predicted by BERT models in which the syntactic information is infused. Our code is available online [https://github.com/Maple177/drugprot-relation-extraction].

【2】 Causal Distillation for Language Models 标题:语言模型的因果提炼 链接:https://arxiv.org/abs/2112.02505

作者:Zhengxuan Wu,Atticus Geiger,Josh Rozner,Elisa Kreiss,Hanson Lu,Thomas Icard,Christopher Potts,Noah D. Goodman 机构:Stanford University 备注:7 pages, 2 figures 摘要:蒸馏的努力使得语言模型更加紧凑和高效,而不会造成性能的严重下降。蒸馏的标准方法针对两个目标训练学生模型:任务特定目标(例如,语言建模)和模仿目标,这两个目标鼓励学生模型的隐藏状态与较大的教师模型的隐藏状态相似。在这篇文章中,我们证明了第三个目标是鼓励学生通过交换干预训练(IIT)模仿教师的因果计算过程,这有利于增强蒸馏。IIT推动学生模型成为教师模型的因果抽象——一个具有相同因果结构的简单模型。IIT是完全可微的,易于实现,并与其他目标灵活结合。与BERT的标准蒸馏相比,通过IIT的蒸馏降低了Wikipedia(蒙面语言建模)的复杂性,并显著改善了GLUE基准(自然语言理解)、SQuAD(问答)和CoNLL-2003(命名实体识别)。 摘要:Distillation efforts have led to language models that are more compact and efficient without serious drops in performance. The standard approach to distillation trains a student model against two objectives: a task-specific objective (e.g., language modeling) and an imitation objective that encourages the hidden states of the student model to be similar to those of the larger teacher model. In this paper, we show that it is beneficial to augment distillation with a third objective that encourages the student to imitate the causal computation process of the teacher through interchange intervention training(IIT). IIT pushes the student model to become a causal abstraction of the teacher model - a simpler model with the same causal structure. IIT is fully differentiable, easily implemented, and combines flexibly with other objectives. Compared with standard distillation of BERT, distillation via IIT results in lower perplexity on Wikipedia (masked language modeling) and marked improvements on the GLUE benchmark (natural language understanding), SQuAD (question answering), and CoNLL-2003 (named entity recognition).

【3】 KDCTime: Knowledge Distillation with Calibration on InceptionTime for Time-series Classification 标题:KDCTime:基于起始时间校准的时间序列分类知识提取 链接:https://arxiv.org/abs/2112.02291

作者:Xueyuan Gong,Yain-Whar Si,Yongqi Tian,Cong Lin,Xinyuan Zhang,Xiaoxiang Liu 摘要:基于深度神经网络的时间序列分类方法在UCR数据集上容易出现过度拟合,这是由于这些数据集的小镜头问题造成的。因此,为了缓解过度拟合现象以进一步提高精度,我们首先提出了InceptionTime标签平滑(LSTime),它采用软标签信息而不是硬标签信息。其次,为了通过教师模型自动生成软标签,提出了知识提取接收时间(KDTime),而不是通过LSTime手动调整软标签。最后,为了纠正教师模型中预测错误的软标签,提出了基于接收时间校准的知识提取(KDCTime),其中包含两种可选的校准策略,即通过翻译的KDC(KDCT)和通过重新排序的KDC(KDCR)。实验结果表明,KDCTime的精度是有希望的,而在训练时间开销可接受的情况下,其推理时间比火箭快两个数量级。 摘要:Time-series classification approaches based on deep neural networks are easy to be overfitting on UCR datasets, which is caused by the few-shot problem of those datasets. Therefore, in order to alleviate the overfitting phenomenon for further improving the accuracy, we first propose Label Smoothing for InceptionTime (LSTime), which adopts the information of soft labels compared to just hard labels. Next, instead of manually adjusting soft labels by LSTime, Knowledge Distillation for InceptionTime (KDTime) is proposed in order to automatically generate soft labels by the teacher model. At last, in order to rectify the incorrect predicted soft labels from the teacher model, Knowledge Distillation with Calibration for InceptionTime (KDCTime) is proposed, where it contains two optional calibrating strategies, i.e. KDC by Translating (KDCT) and KDC by Reordering (KDCR). The experimental results show that the accuracy of KDCTime is promising, while its inference time is two orders of magnitude faster than ROCKET with an acceptable training time overhead.

【4】 Global alignment for relation extraction in Microbiology 标题:微生物学中关系抽取的全局比对方法 链接:https://arxiv.org/abs/2112.02097

作者:Anfu Tang,Claire Nédellec,Pierre Zweigenbaum,Louise Deléger,Robert Bossy 机构:MaIAGE, INRAE, Université Paris-Saclay, Jouy-en-Josas, France, Limsi, CNRS, Université Paris-Saclay, Orsay, France 备注:None 摘要:我们研究了一种基于全局对齐和句法信息的文本关系提取方法。结合支持向量机,该方法在两个RE任务上的性能与LSTM相当,甚至更好。 摘要:We investigate a method to extract relations from texts based on global alignment and syntactic information. Combined with SVM, this method is shown to have a performance comparable or even better than LSTM on two RE tasks.

聚类(1篇)

【1】 Modification-Fair Cluster Editing 标题:修改-公平群集编辑 链接:https://arxiv.org/abs/2112.03183

作者:Vincent Froese,Leon Kellerhals,Rolf Niedermeier 机构:Technische Universit¨at Berlin, Faculty IV, Institute of Software Engineering and Theoretical, Computer Science, Algorithmics and Computational Complexity. 备注:Accepted at AAAI 2022 摘要:经典的聚类编辑问题(也称为相关聚类)要求通过少量的边修改将给定的图转换为不相交的团(簇)并。当应用于顶点着色图(表示子组的颜色)时,NP难聚类编辑问题的标准算法可能会产生偏向于数据子组(例如人口统计组)的解决方案,这些解决方案以涉及子组成员的修改数量来衡量。我们提出了一个修改公平性约束,确保每个子组的编辑次数与其大小成正比。首先,我们研究具有两个顶点颜色的图的修改公平聚类编辑。我们证明了该问题是NP难的,即使在一个子群中只能插入边;请注意,在经典的“非公平”设置中,这种情况是可以用多项式时间解决的。但是,在更一般的编辑形式中,修改公平变量相对于边编辑的数量保持固定参数可处理。我们通过对现实社会网络模型的实证分析来补充这些和进一步的理论结果,我们发现修改公平的价格低得惊人,也就是说,最优修改公平的成本与最优“非公平”解决方案的成本仅相差很小的百分比。 摘要:The classic Cluster Editing problem (also known as Correlation Clustering) asks to transform a given graph into a disjoint union of cliques (clusters) by a small number of edge modifications. When applied to vertex-colored graphs (the colors representing subgroups), standard algorithms for the NP-hard Cluster Editing problem may yield solutions that are biased towards subgroups of data (e.g., demographic groups), measured in the number of modifications incident to the members of the subgroups. We propose a modification fairness constraint which ensures that the number of edits incident to each subgroup is proportional to its size. To start with, we study Modification-Fair Cluster Editing for graphs with two vertex colors. We show that the problem is NP-hard even if one may only insert edges within a subgroup; note that in the classic "non-fair" setting, this case is trivially polynomial-time solvable. However, in the more general editing form, the modification-fair variant remains fixed-parameter tractable with respect to the number of edge edits. We complement these and further theoretical results with an empirical analysis of our model on real-world social networks where we find that the price of modification-fairness is surprisingly low, that is, the cost of optimal modification-fair differs from the cost of optimal "non-fair" solutions only by a small percentage.

自动驾驶|车辆|车道检测等(3篇)

【1】 Intelligent Acoustic Module for Autonomous Vehicles using Fast Gated Recurrent approach 标题:基于快速门控递归方法的自主车智能声学模块 链接:https://arxiv.org/abs/2112.03174

作者:Raghav Rawat,Shreyash Gupta,Shreyas Mohapatra,Sujata Priyambada Mishra,Sreesankar Rajagopal 机构:. ECE department, RV College of Engineering, Bengaluru, Karnataka, India, . CSE Department, . ECE Department, . Assistant Professor, ECE department 备注:6 pages, 8 figures 摘要:本文阐述了一种在资源受限的边缘设备中进行单音和多音分类的模型。该模型是一种先进的快速、精确、稳定的微门递归神经网络。与以前的假设方法相比,该模型通过使用更小的参数和更高的效率以及采用降噪算法,提高了性能指标和更小的尺寸。该模型作为声学AI模块实现,重点用于声音识别、定位和在AI系统(如自动驾驶汽车)上的部署。此外,随着未来城市和发展中国家对多音分类器的需求增加,本地化技术的加入有可能为自动驾驶车辆中的多音分类器增加一个新的维度。 摘要:This paper elucidates a model for acoustic single and multi-tone classification in resource constrained edge devices. The proposed model is of State-of-the-art Fast Accurate Stable Tiny Gated Recurrent Neural Network. This model has resulted in improved performance metrics and lower size compared to previous hypothesized methods by using lesser parameters with higher efficiency and employment of a noise reduction algorithm. The model is implemented as an acoustic AI module, focused for the application of sound identification, localization, and deployment on AI systems like that of an autonomous car. Further, the inclusion of localization techniques carries the potential of adding a new dimension to the multi-tone classifiers present in autonomous vehicles, as its demand increases in urban cities and developing countries in the future.

【2】 Understanding Dynamic Spatio-Temporal Contexts in Long Short-Term Memory for Road Traffic Speed Prediction 标题:道路交通速度预测中长短期记忆中动态时空语境的理解 链接:https://arxiv.org/abs/2112.02409

作者:Won Kyung Lee,Deuk Sin Kwon,So Young Sohn 机构:Department of Industrial Engineering, Yonsei University, Shinchon-dong, Seoul ,-, Republic of Korea 备注:10pages, 2 tables, 4 figures, 2017 KDD Cup 摘要:可靠的交通流预测对于创建智能交通系统至关重要。已经开发了许多基于大数据的预测方法,但它们不能反映考虑时间和位置的道路之间复杂的动态交互。在这项研究中,我们提出了一个动态局部长短时记忆(LSTM)模型,该模型涉及道路之间的空间和时间依赖性。为此,我们使用局部动态空间权重矩阵及其动态变化。此外,LSTM模型可以处理具有长相关性以及复杂非线性特征的序列数据。实证结果表明,与两种不同的基线方法相比,该模型具有更好的预测性能。 摘要:Reliable traffic flow prediction is crucial to creating intelligent transportation systems. Many big-data-based prediction approaches have been developed but they do not reflect complicated dynamic interactions between roads considering time and location. In this study, we propose a dynamically localised long short-term memory (LSTM) model that involves both spatial and temporal dependence between roads. To do so, we use a localised dynamic spatial weight matrix along with its dynamic variation. Moreover, the LSTM model can deal with sequential data with long dependency as well as complex non-linear features. Empirical results indicated superior prediction performances of the proposed model compared to two different baseline methods.

【3】 STJLA: A Multi-Context Aware Spatio-Temporal Joint Linear Attention Network for Traffic Forecasting 标题:STJLA:一种多上下文感知的时空联合线性关注交通预测网络 链接:https://arxiv.org/abs/2112.02262

作者:Yuchen Fang,Yanjun Qin,Haiyong Luo,Fang Zhao,Chenxing Wang 机构: and Chenxing Wang are withthe School of Computer Science (National Pilot Software EngineeringSchool), Beijing University of Posts and Telecommunications 备注:12 pages 摘要:随着交通大数据的增加,交通预测逐渐引起研究者的关注。因此,如何挖掘交通数据中复杂的时空相关性以更准确地预测交通状况成为一个难题。以前的工作将图卷积网络(GCN)和自我注意机制与深度时间序列模型(如递归神经网络)相结合,分别捕获时空相关性,忽略了跨时间和空间的关系。此外,GCN受到过平滑问题的限制,自我注意受到二次问题的限制,导致GCN缺乏全局表示能力,自我注意不能有效地捕捉全局空间依赖性。在本文中,我们提出了一种新的交通预测深度学习模型,称为多上下文感知时空联合线性注意(STJLA),该模型将线性注意应用于时空联合图,以有效地捕获所有时空节点之间的全局依赖性。更具体地说,STJLA利用静态结构上下文和动态语义上下文来提高模型性能。基于node2vec和一个热编码的静态结构上下文丰富了时空位置信息。此外,基于多头扩散卷积网络的动态空间上下文增强了局部空间感知能力,基于GRU的动态时间上下文分别稳定了线性注意的序列位置信息。在两个真实的交通数据集(英格兰和PEMSD7)上进行的实验表明,我们的STJLA在最先进的基线上可以实现最高9.83%和3.08%的MAE测量精度改进。 摘要:Traffic prediction has gradually attracted the attention of researchers because of the increase in traffic big data. Therefore, how to mine the complex spatio-temporal correlations in traffic data to predict traffic conditions more accurately become a difficult problem. Previous works combined graph convolution networks (GCNs) and self-attention mechanism with deep time series models (e.g. recurrent neural networks) to capture the spatio-temporal correlations separately, ignoring the relationships across time and space. Besides, GCNs are limited by over-smoothing issue and self-attention is limited by quadratic problem, result in GCNs lack global representation capabilities, and self-attention inefficiently capture the global spatial dependence. In this paper, we propose a novel deep learning model for traffic forecasting, named Multi-Context Aware Spatio-Temporal Joint Linear Attention (STJLA), which applies linear attention to the spatio-temporal joint graph to capture global dependence between all spatio-temporal nodes efficiently. More specifically, STJLA utilizes static structural context and dynamic semantic context to improve model performance. The static structure context based on node2vec and one-hot encoding enriches the spatio-temporal position information. Furthermore, the multi-head diffusion convolution network based dynamic spatial context enhances the local spatial perception ability, and the GRU based dynamic temporal context stabilizes sequence position information of the linear attention, respectively. Experiments on two real-world traffic datasets, England and PEMSD7, demonstrate that our STJLA can achieve up to 9.83% and 3.08% accuracy improvement in MAE measure over state-of-the-art baselines.

点云|SLAM|雷达|激光|深度RGBD相关(1篇)

【1】 Joint Symmetry Detection and Shape Matching for Non-Rigid Point Cloud 标题:非刚性点云的节点对称性检测与形状匹配 链接:https://arxiv.org/abs/2112.02713

作者:Abhishek Sharma,Maks Ovsjanikov 机构:LIX, Ecole Polytechnique, IPParis, France 备注:Under Review. arXiv admin note: substantial text overlap with arXiv:2110.02994 摘要:尽管深度函数映射在非刚性三维形状匹配中取得了成功,但目前还没有同时对自对称性和形状匹配进行建模的学习框架。尽管对称性失配导致的误差是非刚性形状匹配中的一个主要挑战,这一点仍然存在。在本文中,我们提出了一个新的框架,同时学习自对称性以及一对形状之间的成对映射。我们的关键思想是通过正则化项将自对称映射和成对映射耦合在一起,正则化项为它们提供联合约束,从而获得更精确的映射。我们在几个基准上验证了我们的方法,在这两个任务上,我们的方法都优于许多竞争性基线。 摘要:Despite the success of deep functional maps in non-rigid 3D shape matching, there exists no learning framework that models both self-symmetry and shape matching simultaneously. This is despite the fact that errors due to symmetry mismatch are a major challenge in non-rigid shape matching. In this paper, we propose a novel framework that simultaneously learns both self symmetry as well as a pairwise map between a pair of shapes. Our key idea is to couple a self symmetry map and a pairwise map through a regularization term that provides a joint constraint on both of them, thereby, leading to more accurate maps. We validate our method on several benchmarks where it outperforms many competitive baselines on both tasks.

联邦学习|隐私保护|加密(3篇)

【1】 When the Curious Abandon Honesty: Federated Learning Is Not Private 标题:当好奇抛弃诚实时:联合学习不是私人的 链接:https://arxiv.org/abs/2112.02918

作者:Franziska Boenisch,Adam Dziedzic,Roei Schuster,Ali Shahin Shamsabadi,Ilia Shumailov,Nicolas Papernot 机构:Fraunhofer AISEC, University of Toronto and Vector Institute, Vector Institute and The Alan Turing Institute 摘要:在联邦学习(FL)中,当个人设备联合训练机器学习模型时,数据不会离开个人设备。相反,这些设备与中心方(如公司)共享梯度。由于数据从不“离开”个人设备,FL被视为隐私保护。然而,最近的研究表明,这种保护只不过是一层薄薄的外衣,因为即使是观察渐变的被动攻击者也可以重建单个用户的数据。在本文中,我们认为,以前的工作仍然在很大程度上低估了脆弱性的FL。这是因为以前的努力只考虑被动攻击者是诚实的,但好奇。相反,我们引入了一个活跃且不诚实的攻击者作为中心方,该攻击者能够在用户计算模型梯度之前修改共享模型的权重。我们将修改后的权重称为“陷阱权重”。我们的主动攻击者能够以近乎零的成本完美地恢复用户数据:攻击不需要复杂的优化目标。相反,它利用了模型梯度固有的数据泄漏,并通过恶意改变共享模型的权重来放大这种影响。这些特性使我们的攻击能够扩展到使用大量小批量数据训练的模型。以前工作中的攻击者需要数小时才能恢复单个数据点,我们的方法需要数毫秒才能从完全连接和卷积深度神经网络捕获完整的小批量数据。最后,我们考虑缓解。我们观察到,FL中的差异隐私(DP)的当前实现存在缺陷,因为它们明确信任中央方承担添加DP噪声的关键任务,因此无法针对恶意中央方提供保护。我们还考虑其他防御,并解释为什么它们同样不足。需要对FL进行重大重新设计,以便为用户提供任何有意义的数据隐私形式。 摘要:In federated learning (FL), data does not leave personal devices when they are jointly training a machine learning model. Instead, these devices share gradients with a central party (e.g., a company). Because data never "leaves" personal devices, FL is presented as privacy-preserving. Yet, recently it was shown that this protection is but a thin facade, as even a passive attacker observing gradients can reconstruct data of individual users. In this paper, we argue that prior work still largely underestimates the vulnerability of FL. This is because prior efforts exclusively consider passive attackers that are honest-but-curious. Instead, we introduce an active and dishonest attacker acting as the central party, who is able to modify the shared model's weights before users compute model gradients. We call the modified weights "trap weights". Our active attacker is able to recover user data perfectly and at near zero costs: the attack requires no complex optimization objectives. Instead, it exploits inherent data leakage from model gradients and amplifies this effect by maliciously altering the weights of the shared model. These specificities enable our attack to scale to models trained with large mini-batches of data. Where attackers from prior work require hours to recover a single data point, our method needs milliseconds to capture the full mini-batch of data from both fully-connected and convolutional deep neural networks. Finally, we consider mitigations. We observe that current implementations of differential privacy (DP) in FL are flawed, as they explicitly trust the central party with the crucial task of adding DP noise, and thus provide no protection against a malicious central party. We also consider other defenses and explain why they are similarly inadequate. A significant redesign of FL is required for it to provide any meaningful form of data privacy to users.

【2】 Intrinisic Gradient Compression for Federated Learning 标题:用于联合学习的内梯度压缩算法 链接:https://arxiv.org/abs/2112.02656

作者:Luke Melas-Kyriazi,Franklyn Wang 机构:Department of Computer Science, Oxford University, Harvard University, Department of Mathematics, Cambridge, MA 摘要:联合学习是一个快速增长的研究领域,它使大量客户能够在私有数据上联合训练机器学习模型。更广泛地采用联合学习的最大障碍之一是从客户机发送模型更新和向客户机发送模型更新的通信成本,其中许多设备的带宽受限这一事实更加突出了这一点。在本文中,我们的目标是通过在全参数空间的子空间内优化网络来解决这个问题,这一思想在机器学习理论界被称为内在维。我们利用本征维数和梯度压缩性之间的对应关系,导出了一系列低带宽优化算法,我们称之为本征梯度压缩算法。具体地说,我们介绍了这个家族中的三种算法,它们具有不同级别的上传和下载带宽,可用于各种联邦设置,并从理论上保证了它们的性能。最后,在包含多达100万个参数的模型的大规模联合学习实验中,我们表明,与当前最先进的梯度压缩方法相比,我们的算法性能非常好。 摘要:Federated learning is a rapidly-growing area of research which enables a large number of clients to jointly train a machine learning model on privately-held data. One of the largest barriers to wider adoption of federated learning is the communication cost of sending model updates from and to the clients, which is accentuated by the fact that many of these devices are bandwidth-constrained. In this paper, we aim to address this issue by optimizing networks within a subspace of their full parameter space, an idea known as intrinsic dimension in the machine learning theory community. We use a correspondence between the notion of intrinsic dimension and gradient compressibility to derive a family of low-bandwidth optimization algorithms, which we call intrinsic gradient compression algorithms. Specifically, we present three algorithms in this family with different levels of upload and download bandwidth for use in various federated settings, along with theoretical guarantees on their performance. Finally, in large-scale federated learning experiments with models containing up to 100M parameters, we show that our algorithms perform extremely well compared to current state-of-the-art gradient compression methods.

【3】 Joint Superposition Coding and Training for Federated Learning over Multi-Width Neural Networks 标题:多宽度神经网络联邦学习的联合叠加编码与训练 链接:https://arxiv.org/abs/2112.02543

作者:Hankyul Baek,Won Joon Yun,Yunseok Kwak,Soyi Jung,Mingyue Ji,Mehdi Bennis,Jihong Park,Joongheon Kim 机构:†Department of Electrical and Computer Engineering, Korea University, Seoul, Republic of Korea, ‡School of Software, Hallym University, Chuncheon, Republic of Korea 备注:10 pages, 7 figures, Accepted to IEEE INFOCOM 2022 摘要:本文旨在集成两种协同技术:联邦学习(FL)和宽度可调的可精简神经网络(SNN)结构。FL通过交换本地训练的移动设备模型来保护数据隐私。通过采用SNN作为本地模型,FL可以灵活地应对移动设备随时间变化的能量容量。然而,结合FL和SNN是非常重要的,特别是在具有时变信道条件的无线连接下。此外,现有的多宽度SNN训练算法对设备间的数据分布非常敏感,因此不适合FL。基于此,我们提出了一种基于通信和节能SNN的FL(命名为SlimFL),它联合使用叠加编码(SC)进行全局模型聚合和叠加训练(ST)用于更新本地模型。通过应用SC,SlimFL交换多个宽度配置的叠加,这些配置在给定通信吞吐量下被尽可能多地解码。SlimFL利用ST对齐不同宽度配置的正向传播,同时避免反向传播期间的宽度间干扰。我们正式证明了SlimFL的收敛性。结果表明,SlimFL不仅通信效率高,而且可以抵消非IID数据分布和恶劣的信道条件,仿真结果也证实了这一点。 摘要:This paper aims to integrate two synergetic technologies, federated learning (FL) and width-adjustable slimmable neural network (SNN) architectures. FL preserves data privacy by exchanging the locally trained models of mobile devices. By adopting SNNs as local models, FL can flexibly cope with the time-varying energy capacities of mobile devices. Combining FL and SNNs is however non-trivial, particularly under wireless connections with time-varying channel conditions. Furthermore, existing multi-width SNN training algorithms are sensitive to the data distributions across devices, so are ill-suited to FL. Motivated by this, we propose a communication and energy-efficient SNN-based FL (named SlimFL) that jointly utilizes superposition coding (SC) for global model aggregation and superposition training (ST) for updating local models. By applying SC, SlimFL exchanges the superposition of multiple width configurations that are decoded as many as possible for a given communication throughput. Leveraging ST, SlimFL aligns the forward propagation of different width configurations, while avoiding the inter-width interference during backpropagation. We formally prove the convergence of SlimFL. The result reveals that SlimFL is not only communication-efficient but also can counteract non-IID data distributions and poor channel conditions, which is also corroborated by simulations.

推理|分析|理解|解释(5篇)

【1】 Physically Consistent Neural Networks for building thermal modeling: theory and analysis 标题:用于建筑热模拟的物理一致性神经网络:理论与分析 链接:https://arxiv.org/abs/2112.03212

作者:Loris Di Natale,Bratislav Svetozarevic,Philipp Heer,Colin N. Jones 机构: Swiss Federal Institute of Technology Lausanne (EPFL) 备注:Preprint submitted to Applied Energy. 12 pages in the main text + 5 in appendix, 11 figures 摘要:由于其高能源强度,建筑在当前世界能源转型中发挥着重要作用。建筑模型无处不在,因为它们在建筑使用寿命的每个阶段都需要,即用于设计、改造和控制操作。基于物理方程的经典白盒模型必然遵循物理定律,但其底层结构的具体设计可能会妨碍其表达能力,从而影响其准确性。另一方面,黑箱模型更适合捕捉非线性建筑动态,因此通常可以获得更好的精度,但它们需要大量数据,并且可能不遵循物理定律,这是神经网络(NN)模型特别常见的问题。为了解决这一已知的泛化问题,最近引入了基于物理的神经网络,研究人员在神经网络结构中引入先验知识,以使其符合已知的基本物理定律,并避免经典的神经网络泛化问题。在这项工作中,我们提出了一种新的物理信息神经网络体系结构,称为物理一致性神经网络(PCNN),它只需要过去的操作数据,不需要工程开销,包括与经典神经网络并行运行的线性模块中的先验知识。我们正式证明了这样的网络在物理上是一致的——通过设计,甚至是在看不见的数据上——关于不同的控制输入和外部和相邻区域的温度。我们在一个案例研究中展示了它们的性能,其中PCNN在3$天的预测范围内比基于经典物理的电阻-电容模型的精度高出50\%$。此外,尽管PCNN的结构受到限制,但在验证数据方面,PCNN的性能与经典NNs相似,过度拟合的训练数据较少,并且保持了较高的表达能力以解决泛化问题。 摘要:Due to their high energy intensity, buildings play a major role in the current worldwide energy transition. Building models are ubiquitous since they are needed at each stage of the life of buildings, i.e. for design, retrofitting, and control operations. Classical white-box models, based on physical equations, are bound to follow the laws of physics but the specific design of their underlying structure might hinder their expressiveness and hence their accuracy. On the other hand, black-box models are better suited to capture nonlinear building dynamics and thus can often achieve better accuracy, but they require a lot of data and might not follow the laws of physics, a problem that is particularly common for neural network (NN) models. To counter this known generalization issue, physics-informed NNs have recently been introduced, where researchers introduce prior knowledge in the structure of NNs to ground them in known underlying physical laws and avoid classical NN generalization issues. In this work, we present a novel physics-informed NN architecture, dubbed Physically Consistent NN (PCNN), which only requires past operational data and no engineering overhead, including prior knowledge in a linear module running in parallel to a classical NN. We formally prove that such networks are physically consistent -- by design and even on unseen data -- with respect to different control inputs and temperatures outside and in neighboring zones. We demonstrate their performance on a case study, where the PCNN attains an accuracy up to $50\%$ better than a classical physics-based resistance-capacitance model on $3$-day long prediction horizons. Furthermore, despite their constrained structure, PCNNs attain similar performance to classical NNs on the validation data, overfitting the training data less and retaining high expressiveness to tackle the generalization issue.

【2】 Using Convolutional Neural Networks for fault analysis and alleviation in accelerator systems 标题:卷积神经网络在加速器系统故障分析和缓解中的应用 链接:https://arxiv.org/abs/2112.02657

作者:Jashanpreet Singh Sraw,Deepak M C 机构:Thapar Institute of Engineering and Technology, Patiala, India, PES College of Engineering, Mandya, India 摘要:如今,神经网络几乎是每个技术领域取得突破的基础。它们在加速器上的应用最近使这些系统的性能和效率得到了提高。与此同时,由于最新(萎缩的)半导体技术,硬件故障不断增加,需要加以解决。由于加速器系统通常用于支持时间关键型应用,如自动驾驶汽车或医疗诊断应用,因此必须消除这些硬件故障。我们的研究从系统的角度评估了这些失败。基于我们的结果,我们发现了提高系统可靠性的关键结果,并进一步提出了一种有效的方法,以最小的硬件开销避免这些故障。 摘要:Today, Neural Networks are the basis of breakthroughs in virtually every technical domain. Their application to accelerators has recently resulted in better performance and efficiency in these systems. At the same time, the increasing hardware failures due to the latest (shrinked) semiconductor technology needs to be addressed. Since accelerator systems are often used to back time-critical applications such as self-driving cars or medical diagnosis applications, these hardware failures must be eliminated. Our research evaluates these failures from a systemic point of view. Based on our results, we find critical results for the system reliability enhancement and we further put forth an efficient method to avoid these failures with minimal hardware overhead.

【3】 Modeling Live Video Streaming: Real-Time Classification, QoE Inference, and Field Evaluation 标题:实时视频流建模:实时分类、QoE推断和现场评估 链接:https://arxiv.org/abs/2112.02637

作者:Sharat Chandra Madanapalli,Alex Mathai,Hassan Habibi Gharakheili,Vijay Sivaraman 摘要:在Twitch和YouTube live等平台上,社交媒体、职业体育和视频游戏正在推动实时视频流的快速增长。实时流媒体体验非常容易受到短时网络拥塞的影响,因为客户端播放缓冲区通常不超过几秒钟。不幸的是,识别此类流并测量其QoE以进行网络管理是一项挑战,因为内容提供商在直播和视频点播(VoD)流中基本上使用相同的交付基础设施,并且数据包检查技术(包括SNI/DNS查询监控)无法始终区分这两者。在本文中,我们设计、构建并部署了ReCLive:一种基于网络级行为特征的实时视频检测和QoE测量的机器学习方法。我们的贡献有四个方面:(1)我们分析了来自Twitch和YouTube的约23000个视频流,并确定了其流量特征中区分直播和点播流的关键特征。我们将我们的流量跟踪作为公开数据发布给公众;(2) 我们开发了一个基于LSTM的二进制分类器模型,该模型实时区分实时流和按需流,跨提供商的准确率超过95%;(3) 我们开发了一种方法,根据分辨率和缓冲区暂停事件估计实时流媒体流的QoE度量,总体准确率分别为93%和90%;(4)最后,我们对我们的解决方案进行了原型化,在实验室对其进行了训练,并将其部署到一个服务于7000多名用户的实时ISP网络中。我们的方法为ISP提供对实时视频流的细粒度可见性,使他们能够测量和改善用户体验。 摘要:Social media, professional sports, and video games are driving rapid growth in live video streaming, on platforms such as Twitch and YouTube Live. Live streaming experience is very susceptible to short-time-scale network congestion since client playback buffers are often no more than a few seconds. Unfortunately, identifying such streams and measuring their QoE for network management is challenging, since content providers largely use the same delivery infrastructure for live and video-on-demand (VoD) streaming, and packet inspection techniques (including SNI/DNS query monitoring) cannot always distinguish between the two. In this paper, we design, build, and deploy ReCLive: a machine learning method for live video detection and QoE measurement based on network-level behavioral characteristics. Our contributions are four-fold: (1) We analyze about 23,000 video streams from Twitch and YouTube, and identify key features in their traffic profile that differentiate live and on-demand streaming. We release our traffic traces as open data to the public; (2) We develop an LSTM-based binary classifier model that distinguishes live from on-demand streams in real-time with over 95% accuracy across providers; (3) We develop a method that estimates QoE metrics of live streaming flows in terms of resolution and buffer stall events with overall accuracies of 93% and 90%, respectively; and (4) Finally, we prototype our solution, train it in the lab, and deploy it in a live ISP network serving more than 7,000 subscribers. Our method provides ISPs with fine-grained visibility into live video streams, enabling them to measure and improve user experience.

【4】 Explainable Deep Learning in Healthcare: A Methodological Survey from an Attribution View 标题:医疗保健中可解释的深度学习:归因视角的方法论考察 链接:https://arxiv.org/abs/2112.02625

作者:Di Jin,Elena Sergeeva,Wei-Hung Weng,Geeticka Chauhan,Peter Szolovits 机构:Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA 备注:The first four authors contributed equally, psz is the corresponding author. To appear as an advanced review in WIREs Mechanisms of Disease Journal 摘要:电子病历(EHR)数据的大量收集和深度学习(DL)领域前所未有的技术进步,引发了人们对开发基于DL的诊断、预后和治疗临床决策支持系统的研究兴趣。尽管人们认识到深度学习在医疗保健中的价值,但由于DL的黑箱性质,在实际医疗保健环境中进一步采用的障碍仍然存在。因此,出现了对可解释DL的需求,它允许最终用户在采取行动之前评估模型决策,以了解是接受还是拒绝预测和建议。在这篇综述中,我们重点讨论了DL模型在医疗保健中的可解释性。我们从深入全面地介绍解释性方法开始,作为该领域未来研究人员或临床工作者的方法学参考。除了这些方法的细节之外,我们还讨论了这些方法的优缺点,以及每种方法适用于哪些场景,以便感兴趣的读者能够了解如何在这些方法中进行比较和选择。此外,我们还讨论了这些最初为解决一般领域问题而开发的方法是如何适应和应用于医疗保健问题的,以及它们如何帮助医生更好地理解这些数据驱动技术。总的来说,我们希望这项调查能够帮助人工智能(AI)和临床领域的研究人员和实践者了解我们有哪些方法来增强其DL模型的可解释性,并据此选择最佳的方法。 摘要:The increasing availability of large collections of electronic health record (EHR) data and unprecedented technical advances in deep learning (DL) have sparked a surge of research interest in developing DL based clinical decision support systems for diagnosis, prognosis, and treatment. Despite the recognition of the value of deep learning in healthcare, impediments to further adoption in real healthcare settings remain due to the black-box nature of DL. Therefore, there is an emerging need for interpretable DL, which allows end users to evaluate the model decision making to know whether to accept or reject predictions and recommendations before an action is taken. In this review, we focus on the interpretability of the DL models in healthcare. We start by introducing the methods for interpretability in depth and comprehensively as a methodological reference for future researchers or clinical practitioners in this field. Besides the methods' details, we also include a discussion of advantages and disadvantages of these methods and which scenarios each of them is suitable for, so that interested readers can know how to compare and choose among them for use. Moreover, we discuss how these methods, originally developed for solving general-domain problems, have been adapted and applied to healthcare problems and how they can help physicians better understand these data-driven technologies. Overall, we hope this survey can help researchers and practitioners in both artificial intelligence (AI) and clinical fields understand what methods we have for enhancing the interpretability of their DL models and choose the optimal one accordingly.

【5】 Logic Shrinkage: Learned FPGA Netlist Sparsity for Efficient Neural Network Inference 标题:逻辑收缩:用于高效神经网络推理的学习FPGA网表稀疏性 链接:https://arxiv.org/abs/2112.02346

作者:Erwei Wang,James J. Davis,Georgios-Ilias Stavrou,Peter Y. K. Cheung,George A. Constantinides,Mohamed Abdelfattah 机构:Imperial College London, London, United Kingdom, georgios-, Cornell University, New York, NY, United States 备注:Accepted manuscript uploaded 04/12/21. DOA 22/11/21 摘要:FPGA特定的DNN体系结构使用本机LUT作为独立可训练的推理算子已被证明能够实现良好的区域精度和能量精度权衡。该领域的第一部作品LUTNet展示了标准DNN基准的最先进性能。在本文中,我们提出了这种基于LUT的拓扑的学习优化,与直接使用现成的手工设计的网络相比,这种拓扑的设计效率更高。此类体系结构的现有实现需要手动指定每个LUT的输入数量K。事先选择适当的K是一项挑战,即使是在高粒度(例如每层)下这样做也是一个耗时且容易出错的过程,这使得FPGA的空间灵活性得不到充分利用。此外,以前的工作看到LUT输入随机连接,这不能保证网络拓扑的良好选择。为了解决这些问题,我们提出了逻辑收缩,这是一种细粒度的网表剪枝方法,使K能够自动学习用于FPGA推理的神经网络中的每个LUT。通过删除被确定为低重要性的LUT输入,我们的方法提高了合成加速器的效率。我们的GPU友好型LUT输入移除解决方案能够在训练期间处理大型拓扑,且速度可以忽略不计。通过逻辑收缩,我们将对CIFAR-10进行分类的CNV网络的最佳LUTNet实现的面积和能量效率分别提高1.54倍和1.31倍,同时匹配其精度。这种实现也达到了同样精确、高度修剪的BNN面积效率的2.71倍。在具有Bi Real Net体系结构的ImageNet上,使用逻辑收缩导致合成后面积比LUTNet减少2.67倍,从而实现了以前在当今最大的FPGA上不可能实现的功能。 摘要:FPGA-specific DNN architectures using the native LUTs as independently trainable inference operators have been shown to achieve favorable area-accuracy and energy-accuracy tradeoffs. The first work in this area, LUTNet, exhibited state-of-the-art performance for standard DNN benchmarks. In this paper, we propose the learned optimization of such LUT-based topologies, resulting in higher-efficiency designs than via the direct use of off-the-shelf, hand-designed networks. Existing implementations of this class of architecture require the manual specification of the number of inputs per LUT, K. Choosing appropriate K a priori is challenging, and doing so at even high granularity, e.g. per layer, is a time-consuming and error-prone process that leaves FPGAs' spatial flexibility underexploited. Furthermore, prior works see LUT inputs connected randomly, which does not guarantee a good choice of network topology. To address these issues, we propose logic shrinkage, a fine-grained netlist pruning methodology enabling K to be automatically learned for every LUT in a neural network targeted for FPGA inference. By removing LUT inputs determined to be of low importance, our method increases the efficiency of the resultant accelerators. Our GPU-friendly solution to LUT input removal is capable of processing large topologies during their training with negligible slowdown. With logic shrinkage, we better the area and energy efficiency of the best-performing LUTNet implementation of the CNV network classifying CIFAR-10 by 1.54x and 1.31x, respectively, while matching its accuracy. This implementation also reaches 2.71x the area efficiency of an equally accurate, heavily pruned BNN. On ImageNet with the Bi-Real Net architecture, employment of logic shrinkage results in a post-synthesis area reduction of 2.67x vs LUTNet, allowing for implementation that was previously impossible on today's largest FPGAs.

检测相关(11篇)

【1】 A PubMedBERT-based Classifier with Data Augmentation Strategy for Detecting Medication Mentions in Tweets 标题:一种基于PubMedBERT的带数据增强策略的推文药物提及检测分类器 链接:https://arxiv.org/abs/2112.02998

作者:Qing Han,Shubo Tian,Jinfeng Zhang 机构:Department of Statistics, Florida State University, Tallahassee, Florida, United States 摘要:作为一个主要的社交媒体平台,Twitter每天发布大量用户生成的文本(tweet)。挖掘此类数据可用于解决通过其他方式无法解决的重要社会、公共卫生和应急管理问题。许多文本挖掘管道中的一个重要步骤是实体识别(entity recognition,NER),这对推特数据提出了一些特殊的挑战。其中包括不标准的表达、极端不平衡的类别以及缺乏上下文信息等。生物创造性挑战VII(BC7)的第3轨道旨在评估检测推特中药物提及的方法。在本文中,我们报告了我们在BC7 track 3上的工作,其中我们探索了一种基于PubMedBERT的分类器,该分类器通过多种数据增强方法的组合进行训练。我们的方法F1得分为0.762,大大高于所有提交的平均值(0.696)。 摘要:As a major social media platform, Twitter publishes a large number of user-generated text (tweets) on a daily basis. Mining such data can be used to address important social, public health, and emergency management issues that are infeasible through other means. An essential step in many text mining pipelines is named entity recognition (NER), which presents some special challenges for tweet data. Among them are nonstandard expressions, extreme imbalanced classes, and lack of context information, etc. The track 3 of BioCreative challenge VII (BC7) was organized to evaluate methods for detecting medication mentions in tweets. In this paper, we report our work on BC7 track 3, where we explored a PubMedBERT-based classifier trained with a combination of multiple data augmentation approaches. Our method achieved an F1 score of 0.762, which is substantially higher than the mean of all submissions (0.696).

【2】 Seeing BDD100K in dark: Single-Stage Night-time Object Detection via Continual Fourier Contrastive Learning 标题:在黑暗中看到BDD100K:基于连续傅立叶对比学习的单级夜间目标检测 链接:https://arxiv.org/abs/2112.02891

作者:Ujjal Kr Dutta 摘要:尽管最先进的目标探测器有了巨大的改进,但在有限的可用论文中,通过非统一的评估协议对夜间目标探测的研究也很少。除了缺乏解决这一问题的方法外,还缺乏足够大的基准数据集来研究夜间目标检测。最近,我们引入了大规模的BDD100K,我们认为应该选择它作为基准,以启动这一领域的研究。现在,对于这些方法,现有的方法(数量有限)主要是基于生成图像转换的方法,或者基于图像增强/照明的方法,这两种方法都不是自然的,都不符合人类在夜间看到物体的方式(通过聚焦物体轮廓)。在本文中,我们填补了这3个空白:1。缺乏统一的评估方案(由于其有效性和效率,使用单级检测器),2。选择用于基准夜间目标检测的数据集,以及3。一种解决当前备选方案局限性的新方法。我们的方法利用了基于对比学习的特征提取器,通过傅立叶变换从频域中借用信息,并以基于持续学习的方式进行训练。当用于对象检测时(在微调分类和回归层后),学习的功能有助于实现新的最先进的经验性能,轻松超越大量竞争对手。 摘要:Despite tremendous improvements in state-of-the-art object detectors, addressing object detection in the night-time has been studied only sparsely, that too, via non-uniform evaluation protocols among the limited available papers. In addition to the lack of methods to address this problem, there was also a lack of an adequately large benchmark dataset to study night-time object detection. Recently, the large scale BDD100K was introduced, which, in our opinion, should be chosen as the benchmark, to kickstart research in this area. Now, coming to the methods, existing approaches (limited in number), are mainly either generative image translation based, or image enhancement/ illumination based, neither of which is natural, conforming to how humans see objects in the night time (by focusing on object contours). In this paper, we bridge these 3 gaps: 1. Lack of an uniform evaluation protocol (using a single-stage detector, due to its efficacy, and efficiency), 2. Choice of dataset for benchmarking night-time object detection, and 3. A novel method to address the limitations of current alternatives. Our method leverages a Contrastive Learning based feature extractor, borrowing information from the frequency domain via Fourier transformation, and trained in a continual learning based fashion. The learned features when used for object detection (after fine-tuning the classification and regression layers), help achieve a new state-of-the-art empirical performance, comfortably outperforming an extensive number of competitors.

【3】 Detecting DeFi Securities Violations from Token Smart Contract Code with Random Forest Classification 标题:基于随机森林分类的令牌智能合同码中的Defi证券违规检测 链接:https://arxiv.org/abs/2112.02731

作者:Arianna Trozze,Bennett Kleinberg,Toby Davies 机构:Correspondence:, Department of Computer, Science, University College, London, Gower Street WC,E ,EA, London, UK, Full list of author information is, available at the end of the article 摘要:分散金融(DeFi)是通过各种区块链上的智能合约构建和交付的金融产品和服务系统。在过去的一年中,DeFi获得了人气和市值。然而,它也成为加密货币相关犯罪的中心,特别是各种类型的证券违法行为。在DeFi中,由于缺乏对客户需求的了解,政府不确定如何处理这一领域的违规行为。本研究旨在通过机器学习方法解决这一问题,根据代币的智能合约代码识别可能参与证券违规的DeFi项目。我们更广泛地改编了之前在以太坊检测特定类型证券违规方面的工作,基于从DeFi项目代币智能合约代码中提取的特征构建了一个随机森林分类器。最终分类器的F1分数达到99.1%。对于任何分类问题来说,如此高的性能都是令人惊讶的,然而,从更进一步的特征层面上看,我们发现单一特征使这成为一个高度可检测的问题。我们研究的另一个贡献是一个新的数据集,包括(a)涉及证券违规的代币的经验证的地面真相数据集和(b)来自DeFi聚合商的一组有效代币,该聚合商对其列出的项目进行尽职调查。本文进一步讨论了检察官在执法工作中使用我们的模式,并将其潜在用途与更广泛的法律背景联系起来。 摘要:Decentralized Finance (DeFi) is a system of financial products and services built and delivered through smart contracts on various blockchains. In the past year, DeFi has gained popularity and market capitalization. However, it has also become an epicenter of cryptocurrency-related crime, in particular, various types of securities violations. The lack of Know Your Customer requirements in DeFi has left governments unsure of how to handle the magnitude of offending in this space. This study aims to address this problem with a machine learning approach to identify DeFi projects potentially engaging in securities violations based on their tokens' smart contract code. We adapt prior work on detecting specific types of securities violations across Ethereum more broadly, building a random forest classifier based on features extracted from DeFi projects' tokens' smart contract code. The final classifier achieves a 99.1% F1-score. Such high performance is surprising for any classification problem, however, from further feature-level, we find a single feature makes this a highly detectable problem. Another contribution of our study is a new dataset, comprised of (a) a verified ground truth dataset for tokens involved in securities violations and (b) a set of valid tokens from a DeFi aggregator which conducts due diligence on the projects it lists. This paper further discusses the use of our model by prosecutors in enforcement efforts and connects its potential use to the wider legal context.

【4】 Facial Emotion Characterization and Detection using Fourier Transform and Machine Learning 标题:基于傅立叶变换和机器学习的面部情感表征与检测 链接:https://arxiv.org/abs/2112.02729

作者:Aishwarya Gouru,Shan Suthaharan 机构:Department of Computer Science, University of North Carolina at Greensboro, Greensboro, NC 备注:8 pages, 3 figures 摘要:我们提出了一种基于傅立叶变换的机器学习技术,用于表征和检测面部情绪。在人脸情感分类的机器学习(ML)模型开发中,主要的挑战性任务是从一组训练样本中检测准确的情感特征,并生成特征向量以构建有意义的特征空间和建立ML模型。在本文中,我们假设情感特征隐藏在频域中;因此,可以通过利用频域和掩蔽技术来捕获它们。我们还利用了一个假设,即一个面部情绪与正常面部特征和其他情绪特征相卷积;然而,它们携带线性可分离的空间频率(我们称之为计算情感频率)。因此,我们提出了一种利用快速傅立叶变换(FFT)和矩形窄带频率核以及广泛使用的耶鲁人脸图像数据集的技术。我们使用随机森林(RF)和人工神经网络(ANN)分类器的性能分数作为度量来验证捕获的情感频率的有效性,从而验证假设。我们的发现是,通过提出的方法发现的计算情感频率提供了有意义的情感特征,帮助RF和ANN实现平均93%以上的高精度分数。 摘要:We present a Fourier-based machine learning technique that characterizes and detects facial emotions. The main challenging task in the development of machine learning (ML) models for classifying facial emotions is the detection of accurate emotional features from a set of training samples, and the generation of feature vectors for constructing a meaningful feature space and building ML models. In this paper, we hypothesis that the emotional features are hidden in the frequency domain; hence, they can be captured by leveraging the frequency domain and masking techniques. We also make use of the conjecture that a facial emotions are convoluted with the normal facial features and the other emotional features; however, they carry linearly separable spatial frequencies (we call computational emotional frequencies). Hence, we propose a technique by leveraging fast Fourier transform (FFT) and rectangular narrow-band frequency kernels, and the widely used Yale-Faces image dataset. We test the hypothesis using the performance scores of the random forest (RF) and the artificial neural network (ANN) classifiers as the measures to validate the effectiveness of the captured emotional frequencies. Our finding is that the computational emotional frequencies discovered by the proposed approach provides meaningful emotional features that help RF and ANN achieve a high precision scores above 93%, on average.

【5】 Ensemble and Mixed Learning Techniques for Credit Card Fraud Detection 标题:集成和混合学习技术在信用卡诈骗检测中的应用 链接:https://arxiv.org/abs/2112.02627

作者:Daniel H. M. de Souza,Claudio J. Bordin Jr 机构: for setups with large datasets and labeledUniversidade Federal do ABC 摘要:虚假信用卡交易是财务损失的重要来源,因此迫切需要开发准确的欺诈检测算法。在本文中,我们使用机器学习策略来实现这一目标。首先,我们将一种混合学习技术应用于手边的问题,该技术在训练分类之前使用K-means预处理。接下来,我们介绍一种自适应检测器集成技术,该技术使用OR逻辑算法聚合来提高检测率。然后,这两种策略在使用真实事务数据的数值模拟中串联部署。我们从仿真结果中观察到,所提出的方法降低了计算成本,提高了与最新技术相关的性能。 摘要:Spurious credit card transactions are a significant source of financial losses and urge the development of accurate fraud detection algorithms. In this paper, we use machine learning strategies for such an aim. First, we apply a mixed learning technique that uses K-means preprocessing before trained classification to the problem at hand. Next, we introduce an adapted detector ensemble technique that uses OR-logic algorithm aggregation to enhance the detection rate. Then, both strategies are deployed in tandem in numerical simulations using real-world transactions data. We observed from simulation results that the proposed methods diminished computational cost and enhanced performance concerning state-of-the-art techniques.

【6】 Anomaly Detection of Wind Turbine Time Series using Variational Recurrent Autoencoders 标题:基于变分递归自动编码器的风电机组时间序列异常检测 链接:https://arxiv.org/abs/2112.02468

作者:Alan Preciado-Grijalva,Victor Rodrigo Iza-Teran 机构:Hochschule Bonn-Rhein-Sieg, Fraunhofer Center for Machine Learning and SCAI 摘要:风力涡轮机叶片中的积冰可能导致叶片出现异常旋转或根本不旋转,从而影响发电和功率输出。在这项工作中,我们研究了风力涡轮机中的积冰问题,将其作为多变量时间序列的异常检测。我们的方法集中在两个主要部分:第一,使用变分递归自动编码器(VRAE)学习时间序列的低维表示;第二,使用无监督聚类算法将学习到的表示分类为正常(无积冰)或异常(积冰)。我们已经在定制的风力涡轮机时间序列数据集上评估了我们的方法,对于两类问题(一个正常与一个异常),我们在测试数据上获得了高达96$\%%的分类精度。对于多类问题(一个正常类与多个异常类),我们对低维学习潜在空间进行了定性分析,深入了解了我们解决此类问题的方法的能力。复制这项工作的代码可以在这里找到https://github.com/agrija9/Wind-Turbines-VRAE-Paper. 摘要:Ice accumulation in the blades of wind turbines can cause them to describe anomalous rotations or no rotations at all, thus affecting the generation of electricity and power output. In this work, we investigate the problem of ice accumulation in wind turbines by framing it as anomaly detection of multi-variate time series. Our approach focuses on two main parts: first, learning low-dimensional representations of time series using a Variational Recurrent Autoencoder (VRAE), and second, using unsupervised clustering algorithms to classify the learned representations as normal (no ice accumulated) or abnormal (ice accumulated). We have evaluated our approach on a custom wind turbine time series dataset, for the two-classes problem (one normal versus one abnormal class), we obtained a classification accuracy of up to 96$\%$ on test data. For the multiple-class problem (one normal versus multiple abnormal classes), we present a qualitative analysis of the low-dimensional learned latent space, providing insights into the capacities of our approach to tackle such problem. The code to reproduce this work can be found here https://github.com/agrija9/Wind-Turbines-VRAE-Paper.

【7】 Dense Extreme Inception Network for Edge Detection 标题:用于边缘检测的稠密极端初始网络 链接:https://arxiv.org/abs/2112.02250

作者:Xavier Soria Poma,Angel Sappa,Patricio Humanante,Arash Arbarinia 机构:Computer Vision Center, Autonomous University of Barcelona, Barcelona, Spain, National University of Chimborazo, Riobamba, Ecuador, ESPOL Polytechnic University, FIEC, CIDIS, Guayaquil, Ecuador 备注:Paper submitted to an Elsevier journal 摘要:边缘检测是许多计算机视觉应用的基础。最先进的技术主要依赖于深度学习,有两个决定性因素:数据集内容和网络架构。大多数公开可用的数据集都不是为边缘检测任务而设计的。在这里,我们为这个约束提供了一个解决方案。首先,我们认为边缘、轮廓和边界,尽管它们相互重叠,但它们是三种不同的视觉特征,需要单独的基准数据集。为此,我们提出了一个新的边数据集。其次,我们提出了一种新的结构,称为稠密极限初始边缘检测网络(DexiNed),它可以从零开始训练,而无需任何预先训练的权重。在所提供的数据集中,DexiNed的性能优于其他算法。它还可以很好地推广到其他数据集,而无需任何微调。由于其输出的边缘更加锐利和精细,DexiNed的更高质量在视觉上也是显而易见的。 摘要:Edge detection is the basis of many computer vision applications. State of the art predominantly relies on deep learning with two decisive factors: dataset content and network's architecture. Most of the publicly available datasets are not curated for edge detection tasks. Here, we offer a solution to this constraint. First, we argue that edges, contours and boundaries, despite their overlaps, are three distinct visual features requiring separate benchmark datasets. To this end, we present a new dataset of edges. Second, we propose a novel architecture, termed Dense Extreme Inception Network for Edge Detection (DexiNed), that can be trained from scratch without any pre-trained weights. DexiNed outperforms other algorithms in the presented dataset. It also generalizes well to other datasets without any fine-tuning. The higher quality of DexiNed is also perceptually evident thanks to the sharper and finer edges it outputs.

【8】 PhishMatch: A Layered Approach for Effective Detection of Phishing URLs 标题:PhishMatch:一种有效检测钓鱼URL的分层方法 链接:https://arxiv.org/abs/2112.02226

作者:Harshal Tupsamudre,Sparsh Jain,Sachin Lodha 机构:TCS Research, India 摘要:网络钓鱼攻击仍然是互联网上的一个重大威胁。先前的研究表明,仅仅通过更仔细地分析网站的URL,就可以确定网站是否存在网络钓鱼行为。基于URL的方法的一个主要优点是,它甚至可以在网页在浏览器中呈现之前识别钓鱼网站,从而避免其他潜在问题,如加密劫持和驾车下载。然而,传统的基于URL的方法有其局限性。基于黑名单的方法容易受到零小时网络钓鱼攻击,基于高级机器学习的方法消耗大量资源,其他方法将URL发送到远程服务器,这会损害用户的隐私。在本文中,我们提出了一种分层的反钓鱼防御,PhishMatch,它是健壮的、准确的、廉价的和客户端的。我们设计了一种用于精确字符串匹配的空时高效Aho-Corasick算法和用于近似字符串匹配的基于n-gram的索引技术,以检测钓鱼URL中的各种网络抢注技术。为了减少误报,我们使用了全局白名单和个性化用户白名单。我们还确定访问URL的上下文,并使用该信息对输入URL进行更准确的分类。PhishMatch的最后一个组件涉及机器学习模型和受控搜索引擎查询,以对URL进行分类。为Chrome浏览器开发的PhishMatch原型插件被发现速度快、重量轻。我们的评估表明,PhishMatch既高效又有效。 摘要:Phishing attacks continue to be a significant threat on the Internet. Prior studies show that it is possible to determine whether a website is phishing or not just by analyzing its URL more carefully. A major advantage of the URL based approach is that it can identify a phishing website even before the web page is rendered in the browser, thus avoiding other potential problems such as cryptojacking and drive-by downloads. However, traditional URL based approaches have their limitations. Blacklist based approaches are prone to zero-hour phishing attacks, advanced machine learning based approaches consume high resources, and other approaches send the URL to a remote server which compromises user's privacy. In this paper, we present a layered anti-phishing defense, PhishMatch, which is robust, accurate, inexpensive, and client-side. We design a space-time efficient Aho-Corasick algorithm for exact string matching and n-gram based indexing technique for approximate string matching to detect various cybersquatting techniques in the phishing URL. To reduce false positives, we use a global whitelist and personalized user whitelists. We also determine the context in which the URL is visited and use that information to classify the input URL more accurately. The last component of PhishMatch involves a machine learning model and controlled search engine queries to classify the URL. A prototype plugin of PhishMatch, developed for the Chrome browser, was found to be fast and lightweight. Our evaluation shows that PhishMatch is both efficient and effective.

【9】 Behind the Curtain: Learning Occluded Shapes for 3D Object Detection 标题:幕后:学习用于3D对象检测的遮挡形状 链接:https://arxiv.org/abs/2112.02205

作者:Qiangeng Xu,Yiqi Zhong,Ulrich Neumann 机构:University of Southern California 备注:None 摘要:激光雷达传感器的进步提供了丰富的3D数据,支持3D场景理解。然而,由于遮挡和信号缺失,激光雷达点云实际上是2.5D的,因为它们只覆盖部分底层形状,这对3D感知提出了根本性的挑战。为了应对这一挑战,我们提出了一种新的基于激光雷达的三维物体检测模型,称为幕后探测器(BtcDet),该模型学习物体形状先验知识,并估计点云中部分遮挡(遮挡)的完整物体形状。BtcDet首先识别受遮挡和信号缺失影响的区域。在这些区域中,我们的模型预测占用的概率,这表明区域是否包含对象形状。与此概率图集成,BtcDet可以生成高质量的3D提案。最后,占用概率还集成到提案细化模块中,以生成最终边界框。在KITTI数据集和Waymo开放数据集上的大量实验证明了BtcDet的有效性。特别是在KITTI基准上对汽车和自行车的3D检测方面,BtcDet以惊人的优势超过了所有已发布的最新方法。代码发布了(https://github.com/Xharlie/BtcDet}{https://github.com/Xharlie/BtcDet). 摘要:Advances in LiDAR sensors provide rich 3D data that supports 3D scene understanding. However, due to occlusion and signal miss, LiDAR point clouds are in practice 2.5D as they cover only partial underlying shapes, which poses a fundamental challenge to 3D perception. To tackle the challenge, we present a novel LiDAR-based 3D object detection model, dubbed Behind the Curtain Detector (BtcDet), which learns the object shape priors and estimates the complete object shapes that are partially occluded (curtained) in point clouds. BtcDet first identifies the regions that are affected by occlusion and signal miss. In these regions, our model predicts the probability of occupancy that indicates if a region contains object shapes. Integrated with this probability map, BtcDet can generate high-quality 3D proposals. Finally, the probability of occupancy is also integrated into a proposal refinement module to generate the final bounding boxes. Extensive experiments on the KITTI Dataset and the Waymo Open Dataset demonstrate the effectiveness of BtcDet. Particularly, for the 3D detection of both cars and cyclists on the KITTI benchmark, BtcDet surpasses all of the published state-of-the-art methods by remarkable margins. Code is released (https://github.com/Xharlie/BtcDet}{https://github.com/Xharlie/BtcDet).

【10】 Online false discovery rate control for anomaly detection in time series 标题:时间序列异常检测的在线误发率控制 链接:https://arxiv.org/abs/2112.03196

作者:Quentin Rebjock,Barış Kurt,Tim Januschowski,Laurent Callot 机构:EPFL, Barıs¸ Kurt, Amazon Research 摘要:本文提出了面向时间序列在线异常检测的错误发现率控制(FDRC)新规则。在线FDRC规则允许控制一系列统计测试的属性。在异常检测的上下文中,无效假设是一个观察是正常的,而另一个假设是它是异常的。FDRC规则允许用户在无监督设置下设定精度下限。本文中提出的方法克服了以往FDRC规则在异常检测方面的不足,特别是确保即使备选方案非常罕见(典型的异常检测)且测试统计数据是串行依赖的(典型的时间序列),功率仍然很高。我们在理论和实验上都证明了这些规则的正确性。 摘要:This article proposes novel rules for false discovery rate control (FDRC) geared towards online anomaly detection in time series. Online FDRC rules allow to control the properties of a sequence of statistical tests. In the context of anomaly detection, the null hypothesis is that an observation is normal and the alternative is that it is anomalous. FDRC rules allow users to target a lower bound on precision in unsupervised settings. The methods proposed in this article overcome short-comings of previous FDRC rules in the context of anomaly detection, in particular ensuring that power remains high even when the alternative is exceedingly rare (typical in anomaly detection) and the test statistics are serially dependent (typical in time series). We show the soundness of these rules in both theory and experiments.

【11】 Autoencoders for Semivisible Jet Detection 标题:用于半可见光射流探测的自动编码器 链接:https://arxiv.org/abs/2112.02864

作者:Florencia Canelli,Annapaola de Cosa,Luc Le Pottier,Jeremi Niedziela,Kevin Pedro,Maurizio Pierini 机构:University of Zurich, Switzerland, ETH Zurich, Switzerland, University of California, Berkeley, USA, Fermi National Accelerator Laboratory, Batavia, IL , USA, European Organization for Nuclear Research (CERN), Switzerland 备注:16 pages, 10 figures 摘要:来自封闭暗区的暗物质粒子的产生可能导致许多新的实验特征。根据理论的细节,质子-质子碰撞中产生的暗夸克可能会产生半可见的粒子喷流:粒子对撞机实验只能检测到的暗强子准直喷流。实验特征是存在与射流可见成分共线的重建缺失动量。这种复杂的拓扑结构对探测器效率低下和产生人为动量缺失的错误重建非常敏感。在这项工作中,我们提出了一种信号不可知策略,通过异常检测技术来拒绝普通喷流并识别半可见喷流。以射流子结构变量为输入的深度神经网络自编码网络对异常射流的分析非常有用。研究的重点是半可见射流特征;然而,该技术可以应用于任何新的物理模型,该模型可以预测非SM粒子喷流的特征。 摘要:The production of dark matter particles from confining dark sectors may lead to many novel experimental signatures. Depending on the details of the theory, dark quark production in proton-proton collisions could result in semivisible jets of particles: collimated sprays of dark hadrons of which only some are detectable by particle collider experiments. The experimental signature is characterised by the presence of reconstructed missing momentum collinear with the visible components of the jets. This complex topology is sensitive to detector inefficiencies and mis-reconstruction that generate artificial missing momentum. With this work, we propose a signal-agnostic strategy to reject ordinary jets and identify semivisible jets via anomaly detection techniques. A deep neural autoencoder network with jet substructure variables as input proves highly useful for analyzing anomalous jets. The study focuses on the semivisible jet signature; however, the technique can apply to any new physics model that predicts signatures with jets from non-SM particles.

分类|识别(8篇)

【1】 DANets: Deep Abstract Networks for Tabular Data Classification and Regression 标题:DANet:用于表格数据分类和回归的深层抽象网络 链接:https://arxiv.org/abs/2112.02962

作者:Jintai Chen,Kuanlun Liao,Yao Wan,Danny Z. Chen,Jian Wu 机构: College of Computer Science and Technology, Zhejiang University, Hangzhou, China, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China 备注:@inproceedings{danets, title={DANets: Deep Abstract Networks for Tabular Data Classification and Regression}, author={Chen, Jintai and Liao, Kuanlun and Wan, Yao and Chen, Danny Z and Wu, Jian}, booktitle={AAAI}, year={2022} } 摘要:表格数据在现实世界的应用中无处不在。尽管许多常用的神经组件(例如卷积)和可扩展神经网络(例如ResNet)已由机器学习社区开发,但其中很少有对表格数据有效的,也很少有设计适合表格数据结构。在本文中,我们提出了一种新颖灵活的表格数据神经组件,称为抽象层(Abstray),它学习显式地对相关输入特征进行分组,并生成更高层次的语义抽象特征。此外,我们还设计了一种结构重参数化方法来压缩Abstray,从而在参考阶段将计算复杂度降低了一个明显的幅度。使用Abstrays构建了一个特殊的基本块,并通过叠加这些块构造了一系列用于表格数据分类和回归的深度抽象网络(DANets)。在DANets中,引入了一个特殊的快捷路径,以从原始表格特征中获取信息,从而帮助不同级别的特征交互。在七个实际表格数据集上的综合实验表明,我们的Abstray和DANets对于表格数据分类和回归是有效的,并且计算复杂度优于竞争方法。此外,我们还评估了DANet的性能增益,验证了我们方法的可扩展性。我们的代码可在https://github.com/WhatAShot/DANet. 摘要:Tabular data are ubiquitous in real world applications. Although many commonly-used neural components (e.g., convolution) and extensible neural networks (e.g., ResNet) have been developed by the machine learning community, few of them were effective for tabular data and few designs were adequately tailored for tabular data structures. In this paper, we propose a novel and flexible neural component for tabular data, called Abstract Layer (AbstLay), which learns to explicitly group correlative input features and generate higher-level features for semantics abstraction. Also, we design a structure re-parameterization method to compress AbstLay, thus reducing the computational complexity by a clear margin in the reference phase. A special basic block is built using AbstLays, and we construct a family of Deep Abstract Networks (DANets) for tabular data classification and regression by stacking such blocks. In DANets, a special shortcut path is introduced to fetch information from raw tabular features, assisting feature interactions across different levels. Comprehensive experiments on seven real-world tabular datasets show that our AbstLay and DANets are effective for tabular data classification and regression, and the computational complexity is superior to competitive methods. Besides, we evaluate the performance gains of DANet as it goes deep, verifying the extendibility of our method. Our code is available at https://github.com/WhatAShot/DANet.

【2】 Interpretable Image Classification with Differentiable Prototypes Assignment 标题:基于不同原型赋值的可解释图像分类 链接:https://arxiv.org/abs/2112.02902

作者:Dawid Rymarczyk,Łukasz Struski,Michał Górszczak,Koryna Lewandowska,Jacek Tabor,Bartosz Zieliński 机构:Michał G´orszczak, Bartosz Zieli´nski, Jagiellonian University, Ardigen SA, Department of Cognitive Neuroscience and Neuroergonomics, Institute of Applied Psychology 备注:Code will be published after paper acceptance 摘要:我们介绍ProtoPool,一个可解释的图像分类模型,它有一个由类共享的原型池。与现有方法相比,该训练更为直接,因为它不需要修剪阶段。它是通过引入原型对特定类的完全可微赋值来实现的。此外,我们还引入了一种新的焦点相似性函数,将模型聚焦在罕见的前景特征上。我们表明,ProtoPool在CUB-200-2011和斯坦福汽车数据集上获得了最先进的准确性,大大减少了原型的数量。我们提供了该方法的理论分析和用户研究,以表明我们的原型比通过竞争方法获得的原型更具特色。 摘要:We introduce ProtoPool, an interpretable image classification model with a pool of prototypes shared by the classes. The training is more straightforward than in the existing methods because it does not require the pruning stage. It is obtained by introducing a fully differentiable assignment of prototypes to particular classes. Moreover, we introduce a novel focal similarity function to focus the model on the rare foreground features. We show that ProtoPool obtains state-of-the-art accuracy on the CUB-200-2011 and the Stanford Cars datasets, substantially reducing the number of prototypes. We provide a theoretical analysis of the method and a user study to show that our prototypes are more distinctive than those obtained with competitive methods.

【3】 Unfairness Despite Awareness: Group-Fair Classification with Strategic Agents 标题:有意识的不公平:基于战略代理人的群体公平分类 链接:https://arxiv.org/abs/2112.02746

作者:Andrew Estornell,Sanmay Das,Yang Liu,Yevgeniy Vorobeychik 机构: Washington University in Saint Louis, George Mason University, University of California Santa Cruz 摘要:在影响人们财务、社会和政治福祉的领域中使用算法决策系统,使得人们要求这些决策系统在一些公认的公平概念下“公平”。这一需求反过来又激发了大量的工作,重点是开发公平学习算法,然后用这些算法代替传统算法。对这种公平算法的大多数分析都是基于这样的假设,即受算法决策影响的人被表示为不变的特征向量。然而,战略代理人可能拥有操纵这一观察到的特征向量的能力和动机,以便获得更有利的结果。我们探讨了战略代理人行为可能对公平分类器产生的影响,并得出了在公平分类器考虑的相同公平性度量下,这种行为导致公平分类器变得比传统分类器更不公平的条件。这些条件与公平分类器补救原始未经处理数据不公平的方式有关:公平分类器补救不公平的方式是,当代理具有战略意义时,通过变得比其常规对手更具选择性而变得比其对手更不公平。我们进一步证明,当在传统分类器的决策边界附近(和有利侧)优势群体过度代表的领域上执行公平学习时,公平分类器的选择性增加,并因此导致公平性损失。最后,我们使用一些数据集和学习方法通过实验观察到,这种公平性反转是常见的,并且我们对公平性反转条件的理论描述确实适用于大多数此类情况。 摘要:The use of algorithmic decision making systems in domains which impact the financial, social, and political well-being of people has created a demand for these decision making systems to be "fair" under some accepted notion of equity. This demand has in turn inspired a large body of work focused on the development of fair learning algorithms which are then used in lieu of their conventional counterparts. Most analysis of such fair algorithms proceeds from the assumption that the people affected by the algorithmic decisions are represented as immutable feature vectors. However, strategic agents may possess both the ability and the incentive to manipulate this observed feature vector in order to attain a more favorable outcome. We explore the impact that strategic agent behavior could have on fair classifiers and derive conditions under which this behavior leads to fair classifiers becoming less fair than their conventional counterparts under the same measure of fairness that the fair classifier takes into account. These conditions are related to the the way in which the fair classifier remedies unfairness on the original unmanipulated data: fair classifiers which remedy unfairness by becoming more selective than their conventional counterparts are the ones that become less fair than their counterparts when agents are strategic. We further demonstrate that both the increased selectiveness of the fair classifier, and consequently the loss of fairness, arises when performing fair learning on domains in which the advantaged group is overrepresented in the region near (and on the beneficial side of) the decision boundary of conventional classifiers. Finally, we observe experimentally, using several datasets and learning methods, that this fairness reversal is common, and that our theoretical characterization of the fairness reversal conditions indeed holds in most such cases.

【4】 Beyond Robustness: Resilience Verification of Tree-Based Classifiers 标题:超越稳健性:基于树的分类器的弹性验证 链接:https://arxiv.org/abs/2112.02705

作者:Stefano Calzavara,Lorenzo Cazzaro,Claudio Lucchese,Federico Marcuzzi,Salvatore Orlando 机构:∗Department of Environmental Sciences, Informatics and Statistics, Ca’ Foscari University of Venice, Italy. 摘要:在本文中,我们批评了传统上用于评估在敌对环境中部署的机器学习模型性能的鲁棒性度量。为了缓解健壮性的局限性,我们引入了一种新的度量方法,称为恢复力,并将重点放在它的验证上。特别是,我们讨论了如何通过将传统的鲁棒性验证技术与数据无关的稳定性分析相结合来验证弹性,该稳定性分析识别了特征空间的一个子集,其中模型在对抗性操作下不会改变其预测。然后,我们为决策树和决策树集合引入了一种形式上可靠的独立于数据的稳定性分析,我们在公共数据集上对其进行了实验评估,并利用其进行弹性验证。我们的结果表明,弹性验证在实践中是有用和可行的,可以对标准和稳健的决策树模型进行更可靠的安全评估。 摘要:In this paper we criticize the robustness measure traditionally employed to assess the performance of machine learning models deployed in adversarial settings. To mitigate the limitations of robustness, we introduce a new measure called resilience and we focus on its verification. In particular, we discuss how resilience can be verified by combining a traditional robustness verification technique with a data-independent stability analysis, which identifies a subset of the feature space where the model does not change its predictions despite adversarial manipulations. We then introduce a formally sound data-independent stability analysis for decision trees and decision tree ensembles, which we experimentally assess on public datasets and we leverage for resilience verification. Our results show that resilience verification is useful and feasible in practice, yielding a more reliable security assessment of both standard and robust decision tree models.

【5】 Training Structured Neural Networks Through Manifold Identification and Variance Reduction 标题:基于流形辨识和减方差的结构化神经网络训练 链接:https://arxiv.org/abs/2112.02612

作者:Zih-Syuan Huang,Ching-pei Lee 机构:Institute of Statistical Sciences, Academia Sinica, Taipei, Taiwan 摘要:本文提出了一种训练神经网络(NNs)的算法(RMDA),该算法使用正则化项来提升期望结构。RMDA不需要对带有动量的近端SGD进行额外计算,并且在不要求目标函数为有限和形式的情况下实现方差减少。通过非线性优化的流形识别工具,我们证明了经过有限次迭代后,RMDA的所有迭代在渐近收敛的平稳点处都具有与正则化子诱导的结构相同的期望结构,即使在存在工程技巧的情况下,如数据增加和退出,也会使训练过程复杂化。训练具有结构稀疏性的神经网络的实验证实,方差减少对于这种识别是必要的,并且表明RMDA因此显著优于用于此任务的现有方法。对于非结构化稀疏性,RMDA也优于最先进的修剪方法,验证了通过正则化训练结构化NNs的好处。 摘要:This paper proposes an algorithm (RMDA) for training neural networks (NNs) with a regularization term for promoting desired structures. RMDA does not incur computation additional to proximal SGD with momentum, and achieves variance reduction without requiring the objective function to be of the finite-sum form. Through the tool of manifold identification from nonlinear optimization, we prove that after a finite number of iterations, all iterates of RMDA possess a desired structure identical to that induced by the regularizer at the stationary point of asymptotic convergence, even in the presence of engineering tricks like data augmentation and dropout that complicate the training process. Experiments on training NNs with structured sparsity confirm that variance reduction is necessary for such an identification, and show that RMDA thus significantly outperforms existing methods for this task. For unstructured sparsity, RMDA also outperforms a state-of-the-art pruning method, validating the benefits of training structured NNs through regularization.

【6】 Contextual Multi-View Query Learning for Short Text Classification in User-Generated Data 标题:用户生成数据中短文本分类的上下文多视图查询学习 链接:https://arxiv.org/abs/2112.02611

作者:Payam Karisani,Negin Karisani,Li Xiong 机构:Emory University, Purdue University 摘要:挖掘用户生成的内容——例如,用于早期发现疫情或提取个人观察结果——通常缺乏足够的训练数据、短文档长度和非正式的语言模型。我们提出了一种新的多视图主动学习模型,称为上下文感知协同测试和Bagging(COCOBA),以解决为查询词定制的分类任务中的这些问题,例如,检测给定疾病名称的疾病报告。COCOBA利用用户帖子的上下文来构建两个视图。然后,它使用每个视图中表示的分布来检测分配给相反类的区域。这可以有效地检测两个基础学习者不一致的上下文。我们的模型还采用了委员会查询模型来解决用户帖子中通常嘈杂的语言。实验证明,我们的模型适用于多个重要的有代表性的Twitter任务,并且显著优于现有的基线。 摘要:Mining user-generated content--e.g., for the early detection of outbreaks or for extracting personal observations--often suffers from the lack of enough training data, short document length, and informal language model. We propose a novel multi-view active learning model, called Context-aware Co-testing with Bagging (COCOBA), to address these issues in the classification tasks tailored for a query word--e.g., detecting illness reports given the disease name. COCOBA employs the context of user postings to construct two views. Then it uses the distribution of the representations in each view to detect the regions that are assigned to the opposite classes. This effectively leads to detecting the contexts that the two base learners disagree on. Our model also employs a query-by-committee model to address the usually noisy language of user postings. The experiments testify that our model is applicable to multiple important representative Twitter tasks and also significantly outperforms the existing baselines.

【7】 Face Trees for Expression Recognition 标题:用于表情识别的人脸树 链接:https://arxiv.org/abs/2112.02487

作者:Mojtaba Kolahdouzi,Alireza Sepas-Moghaddam,Ali Etemad 机构:Dept. ECE and Ingenuity Labs Research Institute, Queen’s University, Kingston, Canada 摘要:我们提出了一种端到端的人脸表情识别体系结构。我们的模型学习人脸标志的最佳树拓扑结构,通过遍历生成一个序列,我们从中获得一个嵌入来为序列学习者提供信息。提出的体系结构包含两个主流,一个侧重于地标位置以学习人脸结构,另一个侧重于地标周围的面片以学习纹理信息。每个流后面都有一个注意机制,输出被馈送到两流融合组件以执行最终分类。我们在两个大规模公开的面部表情数据集AffectNet和FER2013上进行了广泛的实验,以评估我们方法的有效性。我们的方法在这方面优于其他解决方案,并在这些数据集上设置了新的最先进的表达式识别率。 摘要:We propose an end-to-end architecture for facial expression recognition. Our model learns an optimal tree topology for facial landmarks, whose traversal generates a sequence from which we obtain an embedding to feed a sequential learner. The proposed architecture incorporates two main streams, one focusing on landmark positions to learn the structure of the face, while the other focuses on patches around the landmarks to learn texture information. Each stream is followed by an attention mechanism and the outputs are fed to a two-stream fusion component to perform the final classification. We conduct extensive experiments on two large-scale publicly available facial expression datasets, AffectNet and FER2013, to evaluate the efficacy of our approach. Our method outperforms other solutions in the area and sets new state-of-the-art expression recognition rates on these datasets.

【8】 Label Hierarchy Transition: Modeling Class Hierarchies to Enhance Deep Classifiers 标题:标签层次转换:对类层次进行建模以增强深度分类器 链接:https://arxiv.org/abs/2112.02353

作者:Renzhen Wang,De cai,Kaiwen Xiao,Xixi Jia,Xiao Han,Deyu Meng 机构:Xi’an Jiaotong University, Tencent, Xidian University 摘要:层次分类的目的是将对象按类别层次进行分类。例如,一只鸟可以按照顺序、科和种的三级层次结构进行分类。现有方法通常通过将层次分类分解为多个多类分类任务来解决层次分类问题。然而,这种多任务学习策略未能充分利用不同层次中不同类别之间的相关性。在本文中,我们提出了标签层次转换,一个基于深度学习的统一概率框架,以解决层次分类问题。具体地说,我们明确地学习了标签层次转换矩阵,其列向量表示类在两个相邻层次之间的条件标签分布,并且能够编码嵌入在类层次中的相关性。我们进一步提出了一种混淆损失,它鼓励分类网络在训练期间学习不同标签层次之间的相关性。所提出的框架只需稍作修改即可适应任何现有的深度网络。我们对三个具有不同类层次结构的公共基准数据集进行了实验,结果表明我们的方法优于现有技术。源代码将公开提供。 摘要:Hierarchical classification aims to sort the object into a hierarchy of categories. For example, a bird can be categorized according to a three-level hierarchy of order, family, and species. Existing methods commonly address hierarchical classification by decoupling it into several multi-class classification tasks. However, such a multi-task learning strategy fails to fully exploit the correlation among various categories across different hierarchies. In this paper, we propose Label Hierarchy Transition, a unified probabilistic framework based on deep learning, to address hierarchical classification. Specifically, we explicitly learn the label hierarchy transition matrices, whose column vectors represent the conditional label distributions of classes between two adjacent hierarchies and could be capable of encoding the correlation embedded in class hierarchies. We further propose a confusion loss, which encourages the classification network to learn the correlation across different label hierarchies during training. The proposed framework can be adapted to any existing deep network with only minor modifications. We experiment with three public benchmark datasets with various class hierarchies, and the results demonstrate the superiority of our approach beyond the prior arts. Source code will be made publicly available.

表征(3篇)

【1】 VarCLR: Variable Semantic Representation Pre-training via Contrastive Learning 标题:VarCLR:基于对比学习的可变语义表征预训练 链接:https://arxiv.org/abs/2112.02650

作者:Qibin Chen,Jeremy Lacomis,Edward J. Schwartz,Graham Neubig,Bogdan Vasilescu,Claire Le Goues 机构:Carnegie Mellon University Software, Engineering Institute 备注:Accepted by ICSE 2022 摘要:变量名对于传达预期的程序行为至关重要。基于机器学习的程序分析方法将变量名表示用于广泛的任务,例如建议新的变量名和错误检测。理想情况下,这种方法可以捕捉到名称之间超出句法相似性的语义关系,例如,名称平均值和平均值相似的事实。不幸的是,以前的工作发现,即使是以前最好的表示方法也主要捕获相关性(两个变量是否有联系),而不是相似性(它们是否具有相同的含义)。我们提出了VarCLR,这是一种学习变量名称语义表示的新方法,它可以有效地捕获更严格意义上的变量相似性。我们观察到,这个问题非常适合对比学习,其目的是最小化明确相似输入之间的距离,同时最大化不同输入之间的距离。这需要标记的训练数据,因此我们构建了一个新的、弱监督的变量重命名数据集,该数据集是从GitHub编辑中挖掘出来的。我们表明,VarCLR能够有效地将复杂的通用语言模型(如BERT)应用于变量名表示,从而也可以应用于相关的下游任务(如变量名相似性搜索或拼写更正)。VarCLR生成的模型显著优于IdBench上的最新技术,IdBench是一个明确捕获变量相似性(与相关性不同)的现有基准。最后,我们发布了所有数据、代码和预先训练的模型,旨在提供一个替代变量表示的版本,用于现有或未来依赖变量名的程序分析。 摘要:Variable names are critical for conveying intended program behavior. Machine learning-based program analysis methods use variable name representations for a wide range of tasks, such as suggesting new variable names and bug detection. Ideally, such methods could capture semantic relationships between names beyond syntactic similarity, e.g., the fact that the names average and mean are similar. Unfortunately, previous work has found that even the best of previous representation approaches primarily capture relatedness (whether two variables are linked at all), rather than similarity (whether they actually have the same meaning). We propose VarCLR, a new approach for learning semantic representations of variable names that effectively captures variable similarity in this stricter sense. We observe that this problem is an excellent fit for contrastive learning, which aims to minimize the distance between explicitly similar inputs, while maximizing the distance between dissimilar inputs. This requires labeled training data, and thus we construct a novel, weakly-supervised variable renaming dataset mined from GitHub edits. We show that VarCLR enables the effective application of sophisticated, general-purpose language models like BERT, to variable name representation and thus also to related downstream tasks like variable name similarity search or spelling correction. VarCLR produces models that significantly outperform the state-of-the-art on IdBench, an existing benchmark that explicitly captures variable similarity (as distinct from relatedness). Finally, we contribute a release of all data, code, and pre-trained models, aiming to provide a drop-in replacement for variable representations used in either existing or future program analyses that rely on variable names.

【2】 Interactive Disentanglement: Learning Concepts by Interacting with their Prototype Representations 标题:交互式解缠:通过与原型表示交互来学习概念 链接:https://arxiv.org/abs/2112.02290

作者:Wolfgang Stammer,Marius Memmel,Patrick Schramowski,Kristian Kersting 机构:Technical University of Darmstadt, Computer Science Department, Technical University of Darmstadt, Centre for Cognitive Science 摘要:在没有强有力监督的情况下,从原始图像中学习视觉概念是一项具有挑战性的任务。在这项工作中,我们展示了原型表征在理解和修正神经概念学习者的潜在空间方面的优势。为此,我们引入了交互式概念交换网络(iCSNs),这是一种通过弱监督和隐式原型表征学习概念基础表征的新框架。ICSN通过交换成对图像的潜在表示,学习将概念信息绑定到特定的原型槽。这种语义基础和离散的潜在空间有助于人类理解和人机交互。我们通过对我们的新数据集“基本概念推理”(ECR)进行实验来支持这一说法,重点是几何对象共享的视觉概念。 摘要:Learning visual concepts from raw images without strong supervision is a challenging task. In this work, we show the advantages of prototype representations for understanding and revising the latent space of neural concept learners. For this purpose, we introduce interactive Concept Swapping Networks (iCSNs), a novel framework for learning concept-grounded representations via weak supervision and implicit prototype representations. iCSNs learn to bind conceptual information to specific prototype slots by swapping the latent representations of paired images. This semantically grounded and discrete latent space facilitates human understanding and human-machine interaction. We support this claim by conducting experiments on our novel data set "Elementary Concept Reasoning" (ECR), focusing on visual concepts shared by geometric objects.

【3】 BenchML: an extensible pipelining framework for benchmarking representations of materials and molecules at scale 标题:BenchML:一种可扩展的流水线框架,用于对材料和分子的规模化表示进行基准测试 链接:https://arxiv.org/abs/2112.02287

作者:Carl Poelking,Felix A. Faber,Bingqing Cheng 机构:Astex Pharmaceuticals, Cambridge, UK, Department of Chemistry, University of Cambridge, UK, Department of Physics, University of Cambridge, UK, The Institute of Science and Technology Austria, Am Campus , Klosterneuburg, Austria 摘要:我们介绍了一个机器学习(ML)框架,用于根据材料和分子数据集对化学系统的各种表示进行高通量基准测试。基准测试方法的指导原则是通过将模型复杂性限制在简单的回归方案中,同时实施最佳ML实践,允许无偏超参数优化,并通过沿一系列同步列车测试分段的学习曲线评估学习进度,从而评估原始描述符的性能。生成的模型旨在作为基线,用于通知未来的方法开发,其次是指示学习给定数据集的容易程度。通过对各种物理化学、拓扑和几何表示的训练结果进行比较分析,我们深入了解了这些表示的相对优点及其相互关系。 摘要:We introduce a machine-learning (ML) framework for high-throughput benchmarking of diverse representations of chemical systems against datasets of materials and molecules. The guiding principle underlying the benchmarking approach is to evaluate raw descriptor performance by limiting model complexity to simple regression schemes while enforcing best ML practices, allowing for unbiased hyperparameter optimization, and assessing learning progress through learning curves along series of synchronized train-test splits. The resulting models are intended as baselines that can inform future method development, next to indicating how easily a given dataset can be learnt. Through a comparative analysis of the training outcome across a diverse set of physicochemical, topological and geometric representations, we glean insight into the relative merits of these representations as well as their interrelatedness.

编码器(3篇)

【1】 Conditional Deep Hierarchical Variational Autoencoder for Voice Conversion 标题:用于语音转换的条件深层次变分自动编码器 链接:https://arxiv.org/abs/2112.02796

作者:Kei Akuzawa,Kotaro Onishi,Keisuke Takiguchi,Kohki Mametani,Koichiro Mori 机构:∗ DeNA Co., Ltd. Tokyo, Japan, † University of Tokyo, Tokyo, Japan, ‡ The University of Electro-Communications, Tokyo, Japan 摘要:基于变分自动编码器的语音转换(VAE-VC)的优点是只需要对语音和说话人标签进行训练。与VAE-VC的大多数研究侧重于利用辅助损失或离散化潜在变量不同,本文研究了模型表达能力的提高对VAE-VC的益处和影响。具体来说,我们首先从率失真的角度分析了VAE-VC,并指出模型表达能力对VAE-VC非常重要,因为率和失真反映了转换语音的相似性和自然性。在此基础上,我们提出了一种新的VC方法,该方法使用了一个深度分层的VAE,由于其非自回归解码器,具有较高的模型表达能力和较快的转换速度。此外,我们的分析还揭示了另一个问题,即当VAEs的潜在变量具有冗余信息时,相似度会降低。我们通过使用$eta$-VAE目标控制潜在变量中包含的信息来解决这个问题。在使用VCTK语料库的实验中,该方法在自然度和性别间相似性方面的平均意见得分均高于3.5,高于现有基于自动编码器的VC方法的得分。 摘要:Variational autoencoder-based voice conversion (VAE-VC) has the advantage of requiring only pairs of speeches and speaker labels for training. Unlike the majority of the research in VAE-VC which focuses on utilizing auxiliary losses or discretizing latent variables, this paper investigates how an increasing model expressiveness has benefits and impacts on the VAE-VC. Specifically, we first analyze VAE-VC from a rate-distortion perspective, and point out that model expressiveness is significant for VAE-VC because rate and distortion reflect similarity and naturalness of converted speeches. Based on the analysis, we propose a novel VC method using a deep hierarchical VAE, which has high model expressiveness as well as having fast conversion speed thanks to its non-autoregressive decoder. Also, our analysis reveals another problem that similarity can be degraded when the latent variable of VAEs has redundant information. We address the problem by controlling the information contained in the latent variable using $eta$-VAE objective. In the experiment using VCTK corpus, the proposed method achieved mean opinion scores higher than 3.5 on both naturalness and similarity in inter-gender settings, which are higher than the scores of existing autoencoder-based VC methods.

【2】 Face Reconstruction with Variational Autoencoder and Face Masks 标题:基于变分自动编码器和人脸模板的人脸重建 链接:https://arxiv.org/abs/2112.02139

作者:Rafael S. Toledo,Eric A. Antonelo 机构:Department of Automation and Systems, Federal University of Santa Catarina (UFSC), Florian´opolis, Brazil 备注:12 pages, 7 figures, 18th Encontro Nacional de Intelig^encia Artificial e Computacional (ENIAC) 摘要:变分自动编码器(VAE)采用深度学习模型来学习高维观测数据集下的连续潜在z空间。有了它,许多任务成为可能,包括人脸重建和人脸合成。在这项工作中,我们研究了如何通过将学习限制在由面罩选择的像素上,面罩可以帮助训练用于人脸重建的VAE。使用celebA数据集对该方案进行的评估表明,使用面罩可以增强重建图像,特别是当SSIM损耗与l1或l2损耗函数一起使用时。我们注意到,在体系结构中包含用于面罩预测的解码器会影响l1或l2损失函数的性能,而SSIM损失的情况并非如此。此外,SSIM感知损失在所有测试假设之间产生了最清晰的样本,尽管它改变了图像的原始颜色,使得l1或l2损失与SSIM一起使用有助于解决此问题。 摘要:Variational AutoEncoders (VAE) employ deep learning models to learn a continuous latent z-space that is subjacent to a high-dimensional observed dataset. With that, many tasks are made possible, including face reconstruction and face synthesis. In this work, we investigated how face masks can help the training of VAEs for face reconstruction, by restricting the learning to the pixels selected by the face mask. An evaluation of the proposal using the celebA dataset shows that the reconstructed images are enhanced with the face masks, especially when SSIM loss is used either with l1 or l2 loss functions. We noticed that the inclusion of a decoder for face mask prediction in the architecture affected the performance for l1 or l2 loss functions, while this was not the case for the SSIM loss. Besides, SSIM perceptual loss yielded the crispest samples between all hypotheses tested, although it shifts the original color of the image, making the usage of the l1 or l2 losses together with SSIM helpful to solve this issue.

【3】 Deconfounding Temporal Autoencoder: Estimating Treatment Effects over Time Using Noisy Proxies 标题:去环境时间自动编码器:使用有噪声的代理估计随时间的治疗效果 链接:https://arxiv.org/abs/2112.03013

作者:Milan Kuzmanovic,Tobias Hatt,Stefan Feuerriegel 机构:ETH Zurich, LMU Munich 备注:None 摘要:根据观察数据估计个体化治疗效果(ITEs)对决策至关重要。为了获得无偏的ITE估计,一个常见的假设是观察到所有的混杂因素。然而,在实践中,我们不太可能直接观察到这些混杂因素。相反,我们经常观察真实混杂因素的噪声测量,这可以作为有效的代理。在本文中,我们解决了在纵向环境中估计ITE的问题,在纵向环境中,我们观察噪声代理而不是真实的混杂因素。为此,我们开发了时间自动编码器,这是一种利用观察到的噪声代理来学习隐藏嵌入的新方法,它反映了真正隐藏的混杂因素。特别是,DTA结合了长短记忆自动编码器和因果正则化惩罚,使得潜在结果和治疗分配条件独立于已知的隐藏嵌入。一旦通过DTA学习到隐藏嵌入,就可以使用最先进的结果模型对其进行控制,并获得ITE的无偏估计。通过使用合成和真实世界的医疗数据,我们证明了我们的DTA的有效性,比最先进的基准提高了相当大的幅度。 摘要:Estimating individualized treatment effects (ITEs) from observational data is crucial for decision-making. In order to obtain unbiased ITE estimates, a common assumption is that all confounders are observed. However, in practice, it is unlikely that we observe these confounders directly. Instead, we often observe noisy measurements of true confounders, which can serve as valid proxies. In this paper, we address the problem of estimating ITE in the longitudinal setting where we observe noisy proxies instead of true confounders. To this end, we develop the Deconfounding Temporal Autoencoder, a novel method that leverages observed noisy proxies to learn a hidden embedding that reflects the true hidden confounders. In particular, the DTA combines a long short-term memory autoencoder with a causal regularization penalty that renders the potential outcomes and treatment assignment conditionally independent given the learned hidden embedding. Once the hidden embedding is learned via DTA, state-of-the-art outcome models can be used to control for it and obtain unbiased estimates of ITE. Using synthetic and real-world medical data, we demonstrate the effectiveness of our DTA by improving over state-of-the-art benchmarks by a substantial margin.

优化|敛散性(6篇)

【1】 Optimal No-Regret Learning in Strongly Monotone Games with Bandit Feedback 标题:强单调带Bandit反馈对策的最优无遗憾学习 链接:https://arxiv.org/abs/2112.02856

作者:Tianyi Lin,Zhengyuan Zhou,Wenjia Ba,Jiawei Zhang 机构:0000© 0000 INFORMSOptimal No-Regret Learning in Strongly MonotoneGames with Bandit FeedbackTianyi LinDepartment of Electrical Engineering and Computer Science, eduWenjia BaStanford Graduate School of Business, Stanford University 备注:40 pages, 3 figures 摘要:我们认为在线无后悔学习在未知的游戏与强盗反馈,其中每个代理只观察其奖励在每个时间-确定由所有玩家的当前联合行动-而不是它的梯度。我们主要研究光滑和强单调对策类,并在其中研究最优无悔学习。利用自协和障碍函数,我们首先构造了一个在线bandit凸优化算法,并证明了在光滑和强凹的支付函数下,该算法实现了$ ilde{Theta}(sqrt{T})$的单代理最优后悔。然后,我们证明,如果每个代理在强单调博弈中应用这种无遗憾学习算法,则联合动作在 extit{last iterate}中以$ ilde{Theta}(1/sqrt{T})的速率收敛到唯一的纳什均衡。在我们的工作之前,同一类博弈中最为人所知的收敛速度是$O(1/T^{1/3})$(通过不同的算法实现),因此留下了最优无遗憾学习算法的问题(因为已知的下界是$Omega(1/sqrt{T})$)。因此,我们的结果解决了这个开放性问题,并通过确定第一个双最优bandit学习算法,为bandit博弈理论学习的广阔前景做出了贡献,因为它在单代理学习中实现了(多达对数因子)最优后悔,在多代理学习中实现了最优的最后迭代收敛速度。我们还展示了一些模拟研究的结果——古诺竞争、凯利拍卖和分布式正则化逻辑回归——以证明我们算法的有效性。 摘要:We consider online no-regret learning in unknown games with bandit feedback, where each agent only observes its reward at each time -- determined by all players' current joint action -- rather than its gradient. We focus on the class of smooth and strongly monotone games and study optimal no-regret learning therein. Leveraging self-concordant barrier functions, we first construct an online bandit convex optimization algorithm and show that it achieves the single-agent optimal regret of $ ilde{Theta}(sqrt{T})$ under smooth and strongly-concave payoff functions. We then show that if each agent applies this no-regret learning algorithm in strongly monotone games, the joint action converges in extit{last iterate} to the unique Nash equilibrium at a rate of $ ilde{Theta}(1/sqrt{T})$. Prior to our work, the best-know convergence rate in the same class of games is $O(1/T^{1/3})$ (achieved by a different algorithm), thus leaving open the problem of optimal no-regret learning algorithms (since the known lower bound is $Omega(1/sqrt{T})$). Our results thus settle this open problem and contribute to the broad landscape of bandit game-theoretical learning by identifying the first doubly optimal bandit learning algorithm, in that it achieves (up to log factors) both optimal regret in the single-agent learning and optimal last-iterate convergence rate in the multi-agent learning. We also present results on several simulation studies -- Cournot competition, Kelly auctions, and distributed regularized logistic regression -- to demonstrate the efficacy of our algorithm.

【2】 On the Convergence of Shallow Neural Network Training with Randomly Masked Neurons 标题:随机掩蔽神经元浅层神经网络训练的收敛性 链接:https://arxiv.org/abs/2112.02668

作者:Fangshuo Liao,Anastasios Kyrillidis 机构:Department of Computer Science, Rice University, Houston, TX , USA 摘要:给定一个稠密的浅层神经网络,我们专注于迭代创建、训练和组合随机选择的子网络(代理函数),以训练完整的模型。通过仔细分析$i)$子网络的神经切线核,$ii)$代理函数的梯度,$iii)$我们如何采样和组合代理函数,我们证明了训练误差的线性收敛速度——在一个误差区域内——对于一个回归任务具有ReLU激活的超参数化单隐层感知器。我们的结果表明,对于固定的神经元选择概率,误差项随着代理模型数量的增加而减少,随着每个选定子网络的局部训练步骤数量的增加而增加。所考虑的框架概括并提供了关于辍学训练、多样本辍学训练以及独立子网训练的新见解;对于每种情况,我们提供相应的收敛结果,作为我们主要定理的推论。 摘要:Given a dense shallow neural network, we focus on iteratively creating, training, and combining randomly selected subnetworks (surrogate functions), towards training the full model. By carefully analyzing $i)$ the subnetworks' neural tangent kernel, $ii)$ the surrogate functions' gradient, and $iii)$ how we sample and combine the surrogate functions, we prove linear convergence rate of the training error -- within an error region -- for an overparameterized single-hidden layer perceptron with ReLU activations for a regression task. Our result implies that, for fixed neuron selection probability, the error term decreases as we increase the number of surrogate models, and increases as we increase the number of local training steps for each selected subnetwork. The considered framework generalizes and provides new insights on dropout training, multi-sample dropout training, as well as Independent Subnet Training; for each case, we provide corresponding convergence results, as corollaries of our main theorem.

【3】 Optimization-Based Separations for Neural Networks 标题:基于优化的神经网络分离 链接:https://arxiv.org/abs/2112.02393

作者:Itay Safran,Jason D. Lee 机构:Princeton University 摘要:深度分离结果为深层神经网络相对于较浅结构的优势提供了一个可能的理论解释,证明了前者具有优越的逼近能力。然而,目前还没有已知的结果表明,更深层次的体系结构能够将这一优势转化为可证明的优化保证。我们证明,当数据由径向对称分布生成,且满足一些温和的假设时,梯度下降法可以使用具有两层S形激活的深度2神经网络有效地学习球指示器函数,并且隐层在整个训练过程中保持固定。众所周知,当使用具有单层非线性的深度2网络时,球指示器很难近似于某个重尾分布(Safran和Shamir,2017),这就确定了我们所知的最佳情况,第一个基于优化的分离结果是,更强体系结构的近似优势在实践中得到证明。我们的证明技术依赖于随机特征方法,该方法将问题简化为单神经元学习,当数据分布为重尾分布时,需要新的工具来显示梯度下降的收敛性。 摘要:Depth separation results propose a possible theoretical explanation for the benefits of deep neural networks over shallower architectures, establishing that the former possess superior approximation capabilities. However, there are no known results in which the deeper architecture leverages this advantage into a provable optimization guarantee. We prove that when the data are generated by a distribution with radial symmetry which satisfies some mild assumptions, gradient descent can efficiently learn ball indicator functions using a depth 2 neural network with two layers of sigmoidal activations, and where the hidden layer is held fixed throughout training. Since it is known that ball indicators are hard to approximate with respect to a certain heavy-tailed distribution when using depth 2 networks with a single layer of non-linearities (Safran and Shamir, 2017), this establishes what is to the best of our knowledge, the first optimization-based separation result where the approximation benefits of the stronger architecture provably manifest in practice. Our proof technique relies on a random features approach which reduces the problem to learning with a single neuron, where new tools are required to show the convergence of gradient descent when the distribution of the data is heavy-tailed.

【4】 Two-step Lookahead Bayesian Optimization with Inequality Constraints 标题:不等式约束下的两步超前贝叶斯优化 链接:https://arxiv.org/abs/2112.02833

作者:Yunxiang Zhang,Xiangyu Zhang,Peter I. Frazier 机构:Cornell University 摘要:计算效率高的非短视贝叶斯优化(BO)技术的最新进展,与传统的短视方法相比,提高了查询效率,如预期改进,但只略微增加了计算成本。然而,这些进步在很大程度上局限于无约束优化。对于约束优化,现有的少数非短视BO方法需要大量计算。例如,一种现有的非近视约束BO方法[Lam和Willcox,2017]依赖于蒙特卡罗卷展采集函数的计算成本高、不可靠、无蛮力导数的优化。使用重参数化技巧在无约束环境下对非近视采集函数进行更有效的基于导数的优化的方法,如样本平均近似和无穷小扰动分析,请勿扩展:约束会在采样采集函数曲面中引入不连续性,从而阻碍其优化。此外,我们认为,在约束问题中,非近视更为重要,因为对违反约束的恐惧会使近视方法远离可行区域和不可行区域之间的边界采样,从而减缓紧约束最优解的发现。在本文中,我们提出了一个计算效率高的两步前瞻约束贝叶斯优化获取函数(2-OPT-C),支持顺序和批设置。为了实现快速捕获函数优化,我们开发了一种新的基于似然比的两步最优捕获函数梯度无偏估计器,该估计器不使用重新参数化技巧。在数值实验中,2-OPT-C通常比以前的方法提高查询效率2倍或更多,在某些情况下提高10倍或更多。 摘要:Recent advances in computationally efficient non-myopic Bayesian optimization (BO) improve query efficiency over traditional myopic methods like expected improvement while only modestly increasing computational cost. These advances have been largely limited, however, to unconstrained optimization. For constrained optimization, the few existing non-myopic BO methods require heavy computation. For instance, one existing non-myopic constrained BO method [Lam and Willcox, 2017] relies on computationally expensive unreliable brute-force derivative-free optimization of a Monte Carlo rollout acquisition function. Methods that use the reparameterization trick for more efficient derivative-based optimization of non-myopic acquisition functions in the unconstrained setting, like sample average approximation and infinitesimal perturbation analysis, do not extend: constraints introduce discontinuities in the sampled acquisition function surface that hinder its optimization. Moreover, we argue here that being non-myopic is even more important in constrained problems because fear of violating constraints pushes myopic methods away from sampling the boundary between feasible and infeasible regions, slowing the discovery of optimal solutions with tight constraints. In this paper, we propose a computationally efficient two-step lookahead constrained Bayesian optimization acquisition function (2-OPT-C) supporting both sequential and batch settings. To enable fast acquisition function optimization, we develop a novel likelihood-ratio-based unbiased estimator of the gradient of the two-step optimal acquisition function that does not use the reparameterization trick. In numerical experiments, 2-OPT-C typically improves query efficiency by 2x or more over previous methods, and in some cases by 10x or more.

【5】 Nonparametric mixture MLEs under Gaussian-smoothed optimal transport distance 标题:高斯平滑最优传输距离下的非参数混合MLES 链接:https://arxiv.org/abs/2112.02421

作者:Fang Han,Zhen Miao,Yandi Shen 备注:26 pages 摘要:高斯平滑最优传输(GOT)框架,由Goldfeld等人首创(2020年)随后发表了一系列论文,很快引起了统计学、机器学习、信息论和相关领域研究人员的注意。其中的一个关键观察结果是,通过适应GOT框架而不是其非光滑的对应框架,将经验度量用于本文证明了一个相关的观察结果适用于离散指数族模型中非参数混合分布的估计,在得到的代价下,非参数极大似然估计的估计精度可以加速到多项式速率。这是一个尖锐的问题与基于非光滑度量的经典次多项式速率相比,非光滑度量无法从信息理论的角度进行改进。我们分析的关键步骤是建立高斯卷积Lipschitz函数的新Jackson型近似界。这一洞察架起了现有非参数分析技术的桥梁ric MLEs和新的GOT框架。 摘要:The Gaussian-smoothed optimal transport (GOT) framework, pioneered in Goldfeld et al. (2020) and followed up by a series of subsequent papers, has quickly caught attention among researchers in statistics, machine learning, information theory, and related fields. One key observation made therein is that, by adapting to the GOT framework instead of its unsmoothed counterpart, the curse of dimensionality for using the empirical measure to approximate the true data generating distribution can be lifted. The current paper shows that a related observation applies to the estimation of nonparametric mixing distributions in discrete exponential family models, where under the GOT cost the estimation accuracy of the nonparametric MLE can be accelerated to a polynomial rate. This is in sharp contrast to the classical sub-polynomial rates based on unsmoothed metrics, which cannot be improved from an information-theoretical perspective. A key step in our analysis is the establishment of a new Jackson-type approximation bound of Gaussian-convoluted Lipschitz functions. This insight bridges existing techniques of analyzing the nonparametric MLEs and the new GOT framework.

【6】 Breaking the Convergence Barrier: Optimization via Fixed-Time Convergent Flows 标题:打破“屏障”的融合:通过定时融合流进行优化 链接:https://arxiv.org/abs/2112.01363

作者:Param Budhraja,Mayank Baranwal,Kunal Garg,Ashish Hota 机构: Indian Institute of Technology Kharagpur, Tata Consultancy Services Research & Innovation, Mumbai, University of California, Santa Cruz 备注:Accepted at AAAI Conference on Artificial Intelligence, 2022, to appear 摘要:加速梯度法是机器学习和其他数据分析领域中自然产生的大规模数据驱动优化问题的基础。我们引入了一个基于梯度的优化框架来实现加速,基于最近引入的动态系统的固定时间稳定性的概念。该方法是简单的基于梯度的方法的推广,可适当缩放,以在固定时间内收敛到优化器,与初始化无关。我们首先利用连续时间框架来设计固定时间稳定的动态系统,然后提供一致的离散化策略,使得等效离散时间算法在实际固定的迭代次数内跟踪优化器,从而实现这一点。我们还从理论上分析了所提出的梯度流的收敛性,以及它们对一系列服从强凸性、严格凸性和可能非凸性但满足Polyak-{L}ojasiewicz不等式的函数的加性扰动的鲁棒性。我们还证明了由于固定时间收敛,收敛速度上的遗憾界是常数。超参数具有直观的解释,并且可以进行调整,以符合所需收敛速度的要求。我们通过一系列数值算例验证了所提格式的加速收敛性,并与最新的优化算法进行了比较。我们的工作为通过连续时间流的离散化开发新的优化算法提供了见解。 摘要:Accelerated gradient methods are the cornerstones of large-scale, data-driven optimization problems that arise naturally in machine learning and other fields concerning data analysis. We introduce a gradient-based optimization framework for achieving acceleration, based on the recently introduced notion of fixed-time stability of dynamical systems. The method presents itself as a generalization of simple gradient-based methods suitably scaled to achieve convergence to the optimizer in a fixed-time, independent of the initialization. We achieve this by first leveraging a continuous-time framework for designing fixed-time stable dynamical systems, and later providing a consistent discretization strategy, such that the equivalent discrete-time algorithm tracks the optimizer in a practically fixed number of iterations. We also provide a theoretical analysis of the convergence behavior of the proposed gradient flows, and their robustness to additive disturbances for a range of functions obeying strong convexity, strict convexity, and possibly nonconvexity but satisfying the Polyak-{L}ojasiewicz inequality. We also show that the regret bound on the convergence rate is constant by virtue of the fixed-time convergence. The hyperparameters have intuitive interpretations and can be tuned to fit the requirements on the desired convergence rates. We validate the accelerated convergence properties of the proposed schemes on a range of numerical examples against the state-of-the-art optimization algorithms. Our work provides insights on developing novel optimization algorithms via discretization of continuous-time flows.

预测|估计(8篇)

【1】 A Novel Prediction Setup for Online Speed-Scaling 标题:一种用于在线调速的新型预测装置 链接:https://arxiv.org/abs/2112.03082

作者:Antonios Antoniadis,Peyman Jabbarzade Ganje,Golnoosh Shahkarami 机构:Max Planck Institut f¨ur Informatik, Universit¨at des Saarlandes 摘要:考虑到数据中心和计算系统对能源的需求总体上快速增长,在设计(调度)算法时考虑能源是至关重要的。机器学习在实践中是一种有用的方法,它可以根据历史数据预测系统的未来负载。然而,这种方法的有效性在很大程度上取决于预测的质量,当预测低于标准时,可能远远不是最优的。另一方面,在提供最坏情况保证的同时,经典在线算法可能对实践中出现的大类输入感到悲观。本文本着机器学习增强算法这一新领域的精神,试图在经典的、基于截止日期的在线速度缩放问题上实现两全其美:在引入新的预测设置的基础上,我们开发了以下算法:(i)在存在充分预测的情况下获得可证明的低能耗,并且(ii)对不充分预测具有鲁棒性,并且(iii)是平滑的,即,它们的性能随着预测误差的增加而逐渐降低。 摘要:Given the rapid rise in energy demand by data centers and computing systems in general, it is fundamental to incorporate energy considerations when designing (scheduling) algorithms. Machine learning can be a useful approach in practice by predicting the future load of the system based on, for example, historical data. However, the effectiveness of such an approach highly depends on the quality of the predictions and can be quite far from optimal when predictions are sub-par. On the other hand, while providing a worst-case guarantee, classical online algorithms can be pessimistic for large classes of inputs arising in practice. This paper, in the spirit of the new area of machine learning augmented algorithms, attempts to obtain the best of both worlds for the classical, deadline based, online speed-scaling problem: Based on the introduction of a novel prediction setup, we develop algorithms that (i) obtain provably low energy-consumption in the presence of adequate predictions, and (ii) are robust against inadequate predictions, and (iii) are smooth, i.e., their performance gradually degrades as the prediction error increases.

【2】 Pairwise Learning for Neural Link Prediction 标题:神经链路预测的成对学习方法 链接:https://arxiv.org/abs/2112.02936

作者:Zhitao Wang,Yong Zhou,Litao Hong,Yuanhang Zou,Hanjing Su 机构:WeChat Pay, Tencent, WeChat Search, Tencent 摘要:在本文中,我们旨在提供一个有效的成对学习神经链接预测(PLNLP)框架。该框架将链路预测视为一个成对学习排序问题,由四个主要部分组成,即邻域编码器、链路预测器、负采样和目标函数。该框架是灵活的,任何通用的图神经卷积或链路预测特定的神经结构都可以用作邻域编码器。对于链路预测器,我们设计了不同的评分函数,可以根据不同类型的图进行选择。在阴性采样器中,我们提供了几种特定于问题的采样策略。对于目标函数,我们建议使用一个有效的排名损失,它近似地最大化标准排名度量AUC。我们在开放图基准的4个链路属性预测数据集上评估了所提出的PLNLP框架,包括 exttt{ogbl ddi}、 exttt{ogbl collab}、 exttt{ogbl ppa}和 exttt{ogbl-ciation2}。PLNLP在 exttt{ogbl ddi}上的性能达到了前1名,在 exttt{ogbl collab}和 exttt{ogbl-ciation2}上的性能达到了前2名,仅在基本的神经结构上。性能证明了PLNLP的有效性。 摘要:In this paper, we aim at providing an effective Pairwise Learning Neural Link Prediction (PLNLP) framework. The framework treats link prediction as a pairwise learning to rank problem and consists of four main components, i.e., neighborhood encoder, link predictor, negative sampler and objective function. The framework is flexible that any generic graph neural convolution or link prediction specific neural architecture could be employed as neighborhood encoder. For link predictor, we design different scoring functions, which could be selected based on different types of graphs. In negative sampler, we provide several sampling strategies, which are problem specific. As for objective function, we propose to use an effective ranking loss, which approximately maximizes the standard ranking metric AUC. We evaluate the proposed PLNLP framework on 4 link property prediction datasets of Open Graph Benchmark, including exttt{ogbl-ddi}, exttt{ogbl-collab}, exttt{ogbl-ppa} and exttt{ogbl-ciation2}. PLNLP achieves Top 1 performance on exttt{ogbl-ddi}, and Top 2 performance on exttt{ogbl-collab} and exttt{ogbl-ciation2} only with basic neural architecture. The performance demonstrates the effectiveness of PLNLP.

【3】 Parameter Efficient Deep Probabilistic Forecasting 标题:参数有效的深度概率预测 链接:https://arxiv.org/abs/2112.02905

作者:Olivier Sprangers Sebastian Schelter Maarten de Rijke 机构: University of AmsterdambUniversity of AmsterdamAbstractProbabilistic time series forecasting is crucial in many application domains such as retail 备注:Accepted as journal paper to the International Journal of Forecasting 摘要:概率时间序列预测在零售、电子商务、金融或生物等许多应用领域中至关重要。随着大量数据可用性的提高,许多神经结构被提出用于解决这个问题。特别是,基于转换器的方法在现实世界基准上实现了最先进的性能。然而,这些方法需要学习大量的参数,这对训练此类模型的计算资源提出了很高的内存要求。为了解决这个问题,我们引入了一种新的双向时间卷积网络(BiTCN),它需要的参数比普通的基于Transformer的方法少一个数量级。我们的模型结合了两个时间卷积网络(TCN):第一个网络编码时间序列的未来协变量,而第二个网络编码过去的观测值和协变量。我们通过这两个网络联合估计输出分布的参数。在四个真实世界的数据集上的实验表明,我们的方法与四个国家的最先进的概率预测方法,包括基于Transformer的方法和WaveNet,在两个点度量(SMAP,NRMSE),以及在一系列的范围度量(分位数损失百分位)在大多数情况下。其次,我们证明了我们的方法比基于转换器的方法所需的参数要少得多,这意味着模型可以更快地训练,内存需求显著降低,从而降低了部署这些模型的基础设施成本。 摘要:Probabilistic time series forecasting is crucial in many application domains such as retail, ecommerce, finance, or biology. With the increasing availability of large volumes of data, a number of neural architectures have been proposed for this problem. In particular, Transformer-based methods achieve state-of-the-art performance on real-world benchmarks. However, these methods require a large number of parameters to be learned, which imposes high memory requirements on the computational resources for training such models. To address this problem, we introduce a novel Bidirectional Temporal Convolutional Network (BiTCN), which requires an order of magnitude less parameters than a common Transformer-based approach. Our model combines two Temporal Convolutional Networks (TCNs): the first network encodes future covariates of the time series, whereas the second network encodes past observations and covariates. We jointly estimate the parameters of an output distribution via these two networks. Experiments on four real-world datasets show that our method performs on par with four state-of-the-art probabilistic forecasting methods, including a Transformer-based approach and WaveNet, on two point metrics (sMAPE, NRMSE) as well as on a set of range metrics (quantile loss percentiles) in the majority of cases. Secondly, we demonstrate that our method requires significantly less parameters than Transformer-based methods, which means the model can be trained faster with significantly lower memory requirements, which as a consequence reduces the infrastructure cost for deploying these models.

【4】 ES-dRNN: A Hybrid Exponential Smoothing and Dilated Recurrent Neural Network Model for Short-Term Load Forecasting 标题:ES-dRNN:短期负荷预测的指数平滑和扩张混合递归神经网络模型 链接:https://arxiv.org/abs/2112.02663

作者:Slawek Smyl,Grzegorz Dudek,Paweł Pełka 机构: Pełka are with the Department of Electrical Engineering, Czestochowa University of Technology 摘要:短期负荷预测(STLF)具有挑战性,因为复杂的时间序列(TS)表示三种季节性模式和非线性趋势。本文提出了一种新的混合分层深度学习模型,该模型处理多个季节性,同时产生点预测和预测区间(PI)。它结合了指数平滑(ES)和递归神经网络(RNN)。ES动态提取每个TS的主要组件,并启用动态去季节化,这在操作相对较小的数据集时特别有用。多层RNN配备了一种新型的扩展递归单元,用于有效地对TS中的短期和长期依赖关系进行建模。为了改进内部TS表示,从而提高模型的性能,RNN同时学习ES参数和将输入转换为预测的主映射函数。在35个欧洲国家的STLF问题上,我们将我们的方法与几种基线方法进行比较,包括经典统计方法和机器学习(ML)方法。实证研究表明,该模型对具有多个季节性和显著随机波动的非线性随机预测问题具有很高的表达能力。事实上,它在准确性方面优于统计和最先进的ML模型。 摘要:Short-term load forecasting (STLF) is challenging due to complex time series (TS) which express three seasonal patterns and a nonlinear trend. This paper proposes a novel hybrid hierarchical deep learning model that deals with multiple seasonality and produces both point forecasts and predictive intervals (PIs). It combines exponential smoothing (ES) and a recurrent neural network (RNN). ES extracts dynamically the main components of each individual TS and enables on-the-fly deseasonalization, which is particularly useful when operating on a relatively small data set. A multi-layer RNN is equipped with a new type of dilated recurrent cell designed to efficiently model both short and long-term dependencies in TS. To improve the internal TS representation and thus the model's performance, RNN learns simultaneously both the ES parameters and the main mapping function transforming inputs into forecasts. We compare our approach against several baseline methods, including classical statistical methods and machine learning (ML) approaches, on STLF problems for 35 European countries. The empirical study clearly shows that the proposed model has high expressive power to solve nonlinear stochastic forecasting problems with TS including multiple seasonality and significant random fluctuations. In fact, it outperforms both statistical and state-of-the-art ML models in terms of accuracy.

【5】 PreGAN: Preemptive Migration Prediction Network for Proactive Fault-Tolerant Edge Computing 标题:PreGAN:面向主动容错边缘计算的抢占式迁移预测网络 链接:https://arxiv.org/abs/2112.02292

作者:Shreshth Tuli,Giuliano Casale,Nicholas R. Jennings 机构:∗Imperial College London, †Loughborough University 备注:Accepted in Infocom 2022 摘要:由于边缘设备的不可靠性和现代应用程序严格的服务期限,构建能够快速响应节点过载或故障的容错边缘系统具有挑战性。此外,不必要的任务迁移可能会给系统网络带来压力,因此需要一种智能且节省的故障恢复方案。以前的方法通常无法适应高度不稳定的工作负载,也无法准确地检测和诊断故障以进行最佳修复。因此,需要一个健壮的、主动的容错机制来满足服务级别目标。在这项工作中,我们提出了PreGAN,这是一个复合AI模型,使用生成性对抗网络(GAN)预测先发制人的迁移决策,以便在集装箱化边缘部署中实现主动容错。PreGAN与GAN一起使用联合仿真来学习一些镜头异常分类器,并主动预测迁移决策,以实现可靠的计算。在基于Raspberry Pi的边缘环境上进行的大量实验表明,PreGAN在故障检测、诊断和分类方面优于最先进的基线方法,从而实现高质量的服务。与所考虑的基线中的最佳方法相比,PreGAN通过提高5.1%的故障检测精度、更高的诊断分数和23.8%的开销来实现这一点。 摘要:Building a fault-tolerant edge system that can quickly react to node overloads or failures is challenging due to the unreliability of edge devices and the strict service deadlines of modern applications. Moreover, unnecessary task migrations can stress the system network, giving rise to the need for a smart and parsimonious failure recovery scheme. Prior approaches often fail to adapt to highly volatile workloads or accurately detect and diagnose faults for optimal remediation. There is thus a need for a robust and proactive fault-tolerance mechanism to meet service level objectives. In this work, we propose PreGAN, a composite AI model using a Generative Adversarial Network (GAN) to predict preemptive migration decisions for proactive fault-tolerance in containerized edge deployments. PreGAN uses co-simulations in tandem with a GAN to learn a few-shot anomaly classifier and proactively predict migration decisions for reliable computing. Extensive experiments on a Raspberry-Pi based edge environment show that PreGAN can outperform state-of-the-art baseline methods in fault-detection, diagnosis and classification, thus achieving high quality of service. PreGAN accomplishes this by 5.1% more accurate fault detection, higher diagnosis scores and 23.8% lower overheads compared to the best method among the considered baselines.

【6】 Combining Embeddings and Fuzzy Time Series for High-Dimensional Time Series Forecasting in Internet of Energy Applications 标题:嵌入与模糊时间序列相结合的高维时间序列预测在能源互联网中的应用 链接:https://arxiv.org/abs/2112.02140

作者:Hugo Vinicius Bitencourt,Luiz Augusto Facury de Souza,Matheus Cascalho dos Santos,Petrônio Cândido de Lima e Silva,Frederico Gadelha Guimarães 机构:Lima e Silvac, Frederico Gadelha Guimar˜aesb,∗, Graduate Program in Electrical Engineering, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil 备注:18 pages, 3 figures 摘要:住宅用电预测对于帮助智能电网管理和保存能源以确保高效使用至关重要。在客户层面进行准确的能源预测将直接反映整个电网系统的效率提高,然而,由于气象和占用模式等诸多影响因素,预测建筑能源使用是一项复杂的任务。此外,随着多传感器环境的出现以及能源消费者与智能电网之间的双向通信,高维时间序列越来越多地出现在能源互联网(IoE)中。因此,能够计算高维时间序列的方法在智能建筑和IoE应用中具有重要价值。模糊时间序列(FTS)模型作为数据驱动的非参数模型,具有易于实现和高精度的特点。不幸的是,如果使用所有特征来训练模型,现有的FTS模型可能不可行。我们提出了一种处理高维时间序列的新方法,通过将原始高维数据投影到低维嵌入空间,并在这种低维表示中使用多元FTS方法。结合这些技术可以更好地表示多元时间序列的复杂内容和更准确的预测。 摘要:The prediction of residential power usage is essential in assisting a smart grid to manage and preserve energy to ensure efficient use. An accurate energy forecasting at the customer level will reflect directly into efficiency improvements across the power grid system, however forecasting building energy use is a complex task due to many influencing factors, such as meteorological and occupancy patterns. In addiction, high-dimensional time series increasingly arise in the Internet of Energy (IoE), given the emergence of multi-sensor environments and the two way communication between energy consumers and the smart grid. Therefore, methods that are capable of computing high-dimensional time series are of great value in smart building and IoE applications. Fuzzy Time Series (FTS) models stand out as data-driven non-parametric models of easy implementation and high accuracy. Unfortunately, the existing FTS models can be unfeasible if all features were used to train the model. We present a new methodology for handling high-dimensional time series, by projecting the original high-dimensional data into a low dimensional embedding space and using multivariate FTS approach in this low dimensional representation. Combining these techniques enables a better representation of the complex content of multivariate time series and more accurate forecasts.

【7】 L2-norm Ensemble Regression with Ocean Feature Weights by Analyzed Images for Flood Inflow Forecast 标题:基于图像分析的海洋特征L2范数集成回归在洪水预报中的应用 链接:https://arxiv.org/abs/2112.03108

作者:Takato Yasuno,Masazumi Amakata,Junichiro Fujii,Masahiro Okano,Riku Ogata 机构:Research Institute for Infrastructure Paradigm Shift, YACHIYO Engineering, Co.,Ltd 备注:10 pages, 10 figures 摘要:预测大坝涌水量对减轻洪水灾害十分重要。过程线提供关键信息,如开始时间、峰值水平和体积。特别是,大坝管理要求根据未来的水文曲线,提前6小时进行大坝流入预测。作者提出了一种新的目标流入权重来创建从分析的海面图像中提取的海洋特征向量。我们在预先训练的VGG16网络的fc6层提取了4096个维度向量元素。随后,我们将其简化为t-SNE的三维。此外,我们还利用主成分分析(PCA)建立了海温权重的主成分。通过数值实验,我们发现这些权重有助于预测重要性的稳定性。作为基本回归模型,我们用核展开法校准最小二乘法,分位数随机林最小化袋外误差,以及多项式核支持向量回归。当我们计算预测器重要性时,我们将我们提出的权重引入的每个变量重要性的稳定性与没有权重的其他结果进行比较。我们将我们的方法应用于日本关东地区的一座大坝,重点关注2007年至2018年的训练期,以及6月至10月的有限洪水期。我们测试2019年洪水期的精度。最后,我们给出了未知洪水预报的应用结果和进一步的统计学习。 摘要:It is important to forecast dam inflow for flood damage mitigation. The hydrograph provides critical information such as the start time, peak level, and volume. Particularly, dam management requires a 6-h lead time of the dam inflow forecast based on a future hydrograph. The authors propose novel target inflow weights to create an ocean feature vector extracted from the analyzed images of the sea surface. We extracted 4,096 elements of the dimension vector in the fc6 layer of the pre-trained VGG16 network. Subsequently, we reduced it to three dimensions of t-SNE. Furthermore, we created the principal component of the sea temperature weights using PCA. We found that these weights contribute to the stability of predictor importance by numerical experiments. As base regression models, we calibrate the least squares with kernel expansion, the quantile random forest minimized out-of bag error, and the support vector regression with a polynomial kernel. When we compute the predictor importance, we visualize the stability of each variable importance introduced by our proposed weights, compared with other results without weights. We apply our method to a dam at Kanto region in Japan and focus on the trained term from 2007 to 2018, with a limited flood term from June to October. We test the accuracy over the 2019 flood term. Finally, we present the applied results and further statistical learning for unknown flood forecast.

【8】 Prediction and compression of lattice QCD data using machine learning algorithms on quantum annealer 标题:基于机器学习算法的量子退火炉晶格QCD数据预测与压缩 链接:https://arxiv.org/abs/2112.02120

作者:Boram Yoon,Chia Cheng Chang,Garrett T. Kenyon,Nga T. T. Nguyen,Ermal Rrapaj 机构:CCS-, Computer, Computational and Statistical Sciences Division, Los Alamos National Laboratory,Los, Alamos, NM , USA, RIKEN iTHEMS, Wako, Saitama ,-, Japan, Department of Physics, University of California, Berkeley, California , USA 备注:None 摘要:我们利用量子退火机的高效二进制优化能力,提出了晶格QCD数据的回归和压缩算法。在回归算法中,我们将输入和输出变量之间的相关性编码为稀疏编码机器学习算法。训练后的相关模式用于从晶格上测量的其他观测值预测不可见晶格构型的晶格QCD观测值。在压缩算法中,我们定义了从浮点数的晶格QCD数据到二进制系数的映射,二进制系数从一组基向量紧密地重构输入数据。由于重建不精确,映射定义了有损压缩,但是,合理数量的二进制系数能够重建晶格QCD数据的输入向量,重建误差远小于统计波动。在这两个应用中,我们都使用D波量子退火机来解决机器学习算法的NP难二进制优化问题。 摘要:We present regression and compression algorithms for lattice QCD data utilizing the efficient binary optimization ability of quantum annealers. In the regression algorithm, we encode the correlation between the input and output variables into a sparse coding machine learning algorithm. The trained correlation pattern is used to predict lattice QCD observables of unseen lattice configurations from other observables measured on the lattice. In the compression algorithm, we define a mapping from lattice QCD data of floating-point numbers to the binary coefficients that closely reconstruct the input data from a set of basis vectors. Since the reconstruction is not exact, the mapping defines a lossy compression, but, a reasonably small number of binary coefficients are able to reconstruct the input vector of lattice QCD data with the reconstruction error much smaller than the statistical fluctuation. In both applications, we use D-Wave quantum annealers to solve the NP-hard binary optimization problems of the machine learning algorithms.

其他神经网络|深度学习|模型|建模(30篇)

【1】 GAM Changer: Editing Generalized Additive Models with Interactive Visualization 标题:GAM转换器:使用交互式可视化编辑广义加法模型 链接:https://arxiv.org/abs/2112.03245

作者:Zijie J. Wang,Alex Kale,Harsha Nori,Peter Stella,Mark Nunnally,Duen Horng Chau,Mihaela Vorvoreanu,Jennifer Wortman Vaughan,Rich Caruana 机构:Georgia Tech ,University of Washington ,Microsoft Research ,NYU Langone Health, CHANGER Align ML Models with Human Knowledge, Slice, B, History Panel, A GAM Canvas, B, Metric Panel, B, Feature Panel, Pneumonia Risk 备注:7 pages, 15 figures, accepted to the Research2Clinics workshop at NeurIPS 2021. For a demo video, see this https URL For a live demo, visit this https URL 摘要:可解释机器学习(ML)研究的最新进展表明,模型利用数据中的不良模式进行预测,这可能会对部署造成危害。然而,目前尚不清楚我们如何修复这些模型。我们介绍我们正在进行的工作,GAM转换器,一个开源的交互式系统,帮助数据科学家和领域专家轻松、负责地编辑他们的广义相加模型(GAMs)。借助新颖的可视化技术,我们的工具将可解释性转化为行动——使人类用户能够分析、验证模型行为,并使其与知识和价值观相一致。使用现代web技术构建,我们的工具在用户的计算笔记本或web浏览器中本地运行,无需额外的计算资源,降低了创建更负责任的ML模型的障碍。GAM转换器可在以下位置获得:https://interpret.ml/gam-changer. 摘要:Recent strides in interpretable machine learning (ML) research reveal that models exploit undesirable patterns in the data to make predictions, which potentially causes harms in deployment. However, it is unclear how we can fix these models. We present our ongoing work, GAM Changer, an open-source interactive system to help data scientists and domain experts easily and responsibly edit their Generalized Additive Models (GAMs). With novel visualization techniques, our tool puts interpretability into action -- empowering human users to analyze, validate, and align model behaviors with their knowledge and values. Built using modern web technologies, our tool runs locally in users' computational notebooks or web browsers without requiring extra compute resources, lowering the barrier to creating more responsible ML models. GAM Changer is available at https://interpret.ml/gam-changer.

【2】 Traversing Time with Multi-Resolution Gaussian Process State-Space Models 标题:基于多分辨率高斯过程状态空间模型的时间遍历 链接:https://arxiv.org/abs/2112.03230

作者:Krista Longi,Jakob Lindinger,Olaf Duennbier,Melih Kandemir,Arto Klami,Barbara Rakitsch 机构:University of Helsinki, Bosch Center for Artificial Intelligence, Robert Bosch GmbH, University of Southern Denmark 摘要:高斯过程状态空间模型通过将高斯过程置于过渡函数之前,以原则性的方式捕获复杂的时间依赖关系。这些模型自然地解释为离散化的随机微分方程,但对于具有快速和慢速转换的长序列的推断是困难的。快速转换需要严格离散化,而慢速转换需要在长子轨道上反向传播梯度。我们提出了一种新的高斯过程状态空间结构,由多个组件组成,每个组件以不同的分辨率进行训练,以模拟不同时间尺度上的影响。组合模型允许在自适应尺度上遍历时间,为具有复杂动力学的任意长序列提供有效的推理。我们在半合成数据和发动机建模任务上对我们的新方法进行了基准测试。在这两个实验中,我们的方法与仅在单一时间尺度上运行的最先进的替代方法相比都是有利的。 摘要:Gaussian Process state-space models capture complex temporal dependencies in a principled manner by placing a Gaussian Process prior on the transition function. These models have a natural interpretation as discretized stochastic differential equations, but inference for long sequences with fast and slow transitions is difficult. Fast transitions need tight discretizations whereas slow transitions require backpropagating the gradients over long subtrajectories. We propose a novel Gaussian process state-space architecture composed of multiple components, each trained on a different resolution, to model effects on different timescales. The combined model allows traversing time on adaptive scales, providing efficient inference for arbitrarily long sequences with complex dynamics. We benchmark our novel method on semi-synthetic data and on an engine modeling task. In both experiments, our approach compares favorably against its state-of-the-art alternatives that operate on a single time-scale only.

【3】 CALVIN: A Benchmark for Language-conditioned Policy Learning for Long-horizon Robot Manipulation Tasks 标题:CALVIN:一种用于长视距机器人操作任务的语言条件策略学习基准 链接:https://arxiv.org/abs/2112.03227

作者:Oier Mees,Lukas Hermann,Erick Rosete-Beas,Wolfram Burgard 机构: All authors are with the Universityof Freiburg 备注:this http URL 摘要:在其环境中与人类共存的通用机器人必须学会将人类语言与其感知和行动联系起来,以便在一系列日常任务中发挥作用。此外,他们还需要掌握多种通用技能,通过遵循无约束的语言指令来完成长期任务。在本文中,我们介绍了CALVIN(从语言和视觉合成动作),这是一个开源的模拟基准,用于学习长视野语言条件任务。我们的目标是开发能够解决许多机器人操作任务的代理,这些任务可以通过车载传感器完成,并且只能通过人类语言指定。CALVIN任务在序列长度、动作空间和语言方面比现有的视觉和语言任务数据集更复杂,并且支持传感器套件的灵活规范。我们评估Zero-Shot中的代理,以获得新的语言指令和新的环境和对象。我们发现,基于多语境模仿学习的基线模型在卡尔文身上表现不佳,这表明有很大的空间开发创新代理,学习将人类语言与其世界模型与该基准相关联。 摘要:General-purpose robots coexisting with humans in their environment must learn to relate human language to their perceptions and actions to be useful in a range of daily tasks. Moreover, they need to acquire a diverse repertoire of general-purpose skills that allow composing long-horizon tasks by following unconstrained language instructions. In this paper, we present CALVIN (Composing Actions from Language and Vision), an open-source simulated benchmark to learn long-horizon language-conditioned tasks. Our aim is to make it possible to develop agents that can solve many robotic manipulation tasks over a long horizon, from onboard sensors, and specified only via human language. CALVIN tasks are more complex in terms of sequence length, action space, and language than existing vision-and-language task datasets and supports flexible specification of sensor suites. We evaluate the agents in zero-shot to novel language instructions and to novel environments and objects. We show that a baseline model based on multi-context imitation learning performs poorly on CALVIN, suggesting that there is significant room for developing innovative agents that learn to relate human language to their world models with this benchmark.

【4】 Multi-scale Feature Learning Dynamics: Insights for Double Descent 标题:多尺度特征学习动态:对双重下降的见解 链接:https://arxiv.org/abs/2112.03215

作者:Mohammad Pezeshki,Amartya Mitra,Yoshua Bengio,Guillaume Lajoie 机构:Mila, Universit´e de Montr´eal, University of California, Riverside 摘要:建立深度学习理论基础的一个关键挑战是神经网络的复杂优化动力学,这是由大量网络参数之间的高维交互作用造成的。这种非平凡的动态导致了有趣的行为,如泛化误差的“双下降”现象。这一现象的更普遍研究方面对应于模型双下降,其中测试误差随着模型复杂性的增加呈现第二次下降,超过了经典的U形误差曲线。在这项工作中,我们研究了较少研究的历元双下降的起源,其中测试误差经历了两个非单调的转变,或者随着训练时间的增加而下降。通过利用统计物理的工具,我们研究了一个线性师生结构,它表现出与深层神经网络类似的划时代双下降。在此背景下,我们推导了广义误差在训练过程中演化的封闭形式解析表达式。我们发现,双下降可以归因于在不同尺度上学习的不同特征:随着快速学习特征的过度拟合,较慢的学习特征开始拟合,导致测试错误再次下降。我们通过数值实验验证了我们的发现,我们的理论准确地预测了实证结果,并与深层神经网络中的观察结果保持一致。 摘要:A key challenge in building theoretical foundations for deep learning is the complex optimization dynamics of neural networks, resulting from the high-dimensional interactions between the large number of network parameters. Such non-trivial dynamics lead to intriguing behaviors such as the phenomenon of "double descent" of the generalization error. The more commonly studied aspect of this phenomenon corresponds to model-wise double descent where the test error exhibits a second descent with increasing model complexity, beyond the classical U-shaped error curve. In this work, we investigate the origins of the less studied epoch-wise double descent in which the test error undergoes two non-monotonous transitions, or descents as the training time increases. By leveraging tools from statistical physics, we study a linear teacher-student setup exhibiting epoch-wise double descent similar to that in deep neural networks. In this setting, we derive closed-form analytical expressions for the evolution of generalization error over training. We find that double descent can be attributed to distinct features being learned at different scales: as fast-learning features overfit, slower-learning features start to fit, resulting in a second descent in test error. We validate our findings through numerical experiments where our theory accurately predicts empirical findings and remains consistent with observations in deep neural networks.

【5】 Quantifying Adaptability in Pre-trained Language Models with 500 Tasks 标题:量化具有500个任务的预训练语言模型中的适应性 链接:https://arxiv.org/abs/2112.03204

作者:Belinda Z. Li,Jane Yu,Madian Khabsa,Luke Zettlemoyer,Alon Halevy,Jacob Andreas 机构:MIT, Meta AI 备注:18 pages, 5 figures, 8 tables 摘要:当一个神经语言模型(LM)适合执行一项新任务时,任务的哪些方面可以预测模型的最终性能?在NLP中,LM推广到单个示例的系统特征得到了很好的描述,但LM适应新任务的系统方面还没有得到很好的理解。我们使用一个新的基准TaskBench500对LM适应性的特征和限制进行了大规模实证研究,该基准由500个程序生成的序列建模任务构建而成。这些任务结合了语言处理的核心方面,包括词汇语义、序列处理、记忆、逻辑推理和世界知识。使用TaskBench500,我们评估了适应性的三个方面,发现:(1)适应性程序在记忆小数据集的能力上存在显著差异;(2) 在任务类型的子集中,适应过程表现出对复杂任务的组合适应性;(3)无法匹配训练标签分布的原因是预测单个标签固有困难的不匹配。我们的实验表明,对新任务的适应性,如对新示例的概括,可以被系统地描述和理解,我们最后讨论了可使用新基准研究的适应性的其他方面。 摘要:When a neural language model (LM) is adapted to perform a new task, what aspects of the task predict the eventual performance of the model? In NLP, systematic features of LM generalization to individual examples are well characterized, but systematic aspects of LM adaptability to new tasks are not nearly as well understood. We present a large-scale empirical study of the features and limits of LM adaptability using a new benchmark, TaskBench500, built from 500 procedurally generated sequence modeling tasks. These tasks combine core aspects of language processing, including lexical semantics, sequence processing, memorization, logical reasoning, and world knowledge. Using TaskBench500, we evaluate three facets of adaptability, finding that: (1) adaptation procedures differ dramatically in their ability to memorize small datasets; (2) within a subset of task types, adaptation procedures exhibit compositional adaptability to complex tasks; and (3) failure to match training label distributions is explained by mismatches in the intrinsic difficulty of predicting individual labels. Our experiments show that adaptability to new tasks, like generalization to new examples, can be systematically described and understood, and we conclude with a discussion of additional aspects of adaptability that could be studied using the new benchmark.

【6】 Label-Efficient Semantic Segmentation with Diffusion Models 标题:基于扩散模型的高效标注语义分割 链接:https://arxiv.org/abs/2112.03126

作者:Dmitry Baranchuk,Ivan Rubachev,Andrey Voynov,Valentin Khrulkov,Artem Babenko 机构: Yandex, Russia, National Research University Higher School of Economics, Russia 摘要:去噪扩散概率模型最近受到了广泛的研究关注,因为它们优于其他方法,如GANs,并且目前提供了最先进的生成性能。扩散模型优越的性能使其在修复、超分辨率和语义编辑等应用中成为一种极具吸引力的工具。在本文中,我们证明了扩散模型也可以作为语义分割的工具,特别是在标记数据稀少的情况下。特别是,对于几个预训练扩散模型,我们研究了执行反向扩散过程马尔可夫步的网络的中间激活。我们表明,这些激活有效地捕获了输入图像的语义信息,并且似乎是分割问题的优秀像素级表示。基于这些观察结果,我们描述了一种简单的分割方法,即使只提供少量的训练图像,该方法也可以工作。我们的方法在几个数据集上显著优于现有的替代方法,以获得相同数量的人类监督。 摘要:Denoising diffusion probabilistic models have recently received much research attention since they outperform alternative approaches, such as GANs, and currently provide state-of-the-art generative performance. The superior performance of diffusion models has made them an appealing tool in several applications, including inpainting, super-resolution, and semantic editing. In this paper, we demonstrate that diffusion models can also serve as an instrument for semantic segmentation, especially in the setup when labeled data is scarce. In particular, for several pretrained diffusion models, we investigate the intermediate activations from the networks that perform the Markov step of the reverse diffusion process. We show that these activations effectively capture the semantic information from an input image and appear to be excellent pixel-level representations for the segmentation problem. Based on these observations, we describe a simple segmentation method, which can work even if only a few training images are provided. Our approach significantly outperforms the existing alternatives on several datasets for the same amount of human supervision.

【7】 Keyword Assisted Embedded Topic Model 标题:关键词辅助的嵌入式主题模型 链接:https://arxiv.org/abs/2112.03101

作者:Bahareh Harandizadeh,J. Hunter Priniski,Fred Morstatter 机构:Information Science Institute, University of Southern California, Los Angeles, CA, USA, Department of Psychology, University of California, Los Angeles 备注:8 pages, 5 figures, WSDM 2022 Conference 摘要:通过揭示文本语料库中的潜在结构,主题模型是对大量文档进行分类、总结和探索的重要工具。概率主题模型,如潜在Dirichlet分配(LDA),描述了文档中的单词是如何通过一组称为主题的潜在分布生成的。最近,嵌入式主题模型(ETM)扩展了LDA,以利用单词嵌入中的语义信息来派生语义更丰富的主题。由于LDA及其扩展是无监督的模型,它们的定义不能有效地利用用户对领域的先验知识。为此,我们提出了关键字辅助嵌入式主题模型(KeyETM),该模型使ETM能够将用户知识以信息性主题优先于词汇表的形式结合起来。使用定量指标和人类对主题入侵任务的反应,我们证明了KeyETM比文献中的其他引导生成模型产生更好的主题。 摘要:By illuminating latent structures in a corpus of text, topic models are an essential tool for categorizing, summarizing, and exploring large collections of documents. Probabilistic topic models, such as latent Dirichlet allocation (LDA), describe how words in documents are generated via a set of latent distributions called topics. Recently, the Embedded Topic Model (ETM) has extended LDA to utilize the semantic information in word embeddings to derive semantically richer topics. As LDA and its extensions are unsupervised models, they aren't defined to make efficient use of a user's prior knowledge of the domain. To this end, we propose the Keyword Assisted Embedded Topic Model (KeyETM), which equips ETM with the ability to incorporate user knowledge in the form of informative topic-level priors over the vocabulary. Using both quantitative metrics and human responses on a topic intrusion task, we demonstrate that KeyETM produces better topics than other guided, generative models in the literature.

【8】 Flexible Option Learning 标题:灵活的选项学习 链接:https://arxiv.org/abs/2112.03097

作者:Martin Klissarov,Doina Precup 机构:Mila, McGill University and DeepMind 备注:NeurIPS 2021 Spotlight 摘要:强化学习中的时间抽象(RL)通过更有效地传播信息,为改善复杂环境中的泛化和知识转移提供了希望。虽然期权学习最初是以允许同时更新多个期权的方式制定的,使用非策略、期权内学习(Sutton、Precup&Singh,1999),但最近的许多分层强化学习方法一次只更新一个期权:当前执行的期权。我们在深度强化学习的背景下重新审视和扩展选项内学习,以便能够更新与当前原始动作选择一致的所有选项,而不引入任何额外的估计。因此,我们的方法可以自然地在大多数分层RL框架中采用。当我们将我们的方法与用于选项发现的option critic算法相结合时,我们在许多领域的性能和数据效率都得到了显著的提高。 摘要:Temporal abstraction in reinforcement learning (RL), offers the promise of improving generalization and knowledge transfer in complex environments, by propagating information more efficiently over time. Although option learning was initially formulated in a way that allows updating many options simultaneously, using off-policy, intra-option learning (Sutton, Precup & Singh, 1999), many of the recent hierarchical reinforcement learning approaches only update a single option at a time: the option currently executing. We revisit and extend intra-option learning in the context of deep reinforcement learning, in order to enable updating all options consistent with current primitive action choices, without introducing any additional estimates. Our method can therefore be naturally adopted in most hierarchical RL frameworks. When we combine our approach with the option-critic algorithm for option discovery, we obtain significant improvements in performance and data-efficiency across a wide variety of domains.

【9】 Learning Generalized Causal Structure in Time-series 标题:学习时间序列中的广义因果结构 链接:https://arxiv.org/abs/2112.03085

作者:Aditi Kathpalia,Keerti P. Charantimath,Nithin Nagaraj 机构:Department of Complex Systems, Institute of Computer Science of the Czech Academy of, Sciences, Prague, Czech Republic, Keerti Panchakshari Charantimath, Department of Mathematics, Indian Institute of Technology, Kharagpur, West Bengal, India 备注:10 pages, 4 figures 摘要:因果关系科学通过为此目的提供数学工具来解释/确定系统实体之间的“因果”关系。尽管机器学习(ML)算法取得了巨大的成功并得到了广泛的应用,但这些算法仅仅基于统计学习。目前,他们与“类人”智力相差甚远,因为他们无法回答和学习重要的“为什么?”问题。因此,研究人员正试图将ML与因果关系科学相结合。在ML遇到的许多因果学习问题中,一个是这些算法对数据的时间顺序或结构不敏感。在这项工作中,我们基于最近提出的“神经混沌”特征学习技术(ChaosFEX特征提取器)开发了一个机器学习管道,帮助我们学习给定时间序列数据中的广义因果结构。 摘要:The science of causality explains/determines 'cause-effect' relationship between the entities of a system by providing mathematical tools for the purpose. In spite of all the success and widespread applications of machine-learning (ML) algorithms, these algorithms are based on statistical learning alone. Currently, they are nowhere close to 'human-like' intelligence as they fail to answer and learn based on the important "Why?" questions. Hence, researchers are attempting to integrate ML with the science of causality. Among the many causal learning issues encountered by ML, one is that these algorithms are dumb to the temporal order or structure in data. In this work we develop a machine learning pipeline based on a recently proposed 'neurochaos' feature learning technique (ChaosFEX feature extractor), that helps us to learn generalized causal-structure in given time-series data.

【10】 Thinking Beyond Distributions in Testing Machine Learned Models 标题:机器学习模型测试中的超越分布思考 链接:https://arxiv.org/abs/2112.03057

作者:Negar Rostamzadeh,Ben Hutchinson,Christina Greer,Vinodkumar Prabhakaran 机构:Google Research 备注:None 摘要:机器学习(ML)社区内的测试实践集中于评估学习模型的预测性能,该模型根据测试数据集进行测量,测试数据集通常来自与训练数据集相同的分布。尽管最近在ML社区内关于稳健性和公平性测试的工作指出了针对分布变化进行测试的重要性,但这些工作也集中于估计模型对参考数据集/分布产生错误的可能性。我们认为,这种测试观点积极地阻止了研究人员和开发人员研究健壮性故障的其他来源,例如可能产生严重不良影响的角落案例。我们与软件工程测试中数十年的工作进行了对比,这些工作专注于评估软件系统在各种压力条件下的性能,包括角落案例,而不是仅仅关注平均案例行为。最后,我们提出了一系列建议,将机器学习测试的视野扩展到严格的实践中。 摘要:Testing practices within the machine learning (ML) community have centered around assessing a learned model's predictive performance measured against a test dataset, often drawn from the same distribution as the training dataset. While recent work on robustness and fairness testing within the ML community has pointed to the importance of testing against distributional shifts, these efforts also focus on estimating the likelihood of the model making an error against a reference dataset/distribution. We argue that this view of testing actively discourages researchers and developers from looking into other sources of robustness failures, for instance corner cases which may have severe undesirable impacts. We draw parallels with decades of work within software engineering testing focused on assessing a software system against various stress conditions, including corner cases, as opposed to solely focusing on average-case behaviour. Finally, we put forth a set of recommendations to broaden the view of machine learning testing to a rigorous practice.

【11】 Keeping it Simple: Language Models can learn Complex Molecular Distributions 标题:保持简单:语言模型可以学习复杂的分子分布 链接:https://arxiv.org/abs/2112.03041

作者:Daniel Flam-Shepherd,Kevin Zhu,Alán Aspuru-Guzik 机构:Department of Computer Science, University of Toronto, Toronto, Ontario M,S ,E, Canada, Vector Institute for Artificial Intelligence, Toronto, Ontario M,S ,M, Canada, Department of Chemistry, University of Toronto, Toronto, Ontario M,G ,Z, Canada 摘要:分子的深层生成模型在相关数据集上得到了训练,并得到了极大的普及,这些模型被用来搜索化学空间。新功能化合物逆向设计的生成模型的下游效用取决于它们学习分子训练分布的能力。最简单的例子是采用递归神经网络形式的语言模型,并使用字符串表示生成分子。更复杂的是图形生成模型,它按顺序构造分子图,通常实现最先进的结果。然而,最近的研究表明,语言模型比人们曾经认为的更有能力,特别是在数据量较低的情况下。在这项工作中,我们研究了简单语言模型学习分子分布的能力。为此,我们通过编译特别复杂的分子分布来介绍几个具有挑战性的生成性建模任务。在每项任务中,我们将语言模型的能力与两种广泛使用的图形生成模型进行比较。结果表明,语言模型是强大的生成模型,能够熟练地学习复杂的分子分布,并且比图形模型具有更好的性能。语言模型可以准确地生成:ZINC15中得分最高的惩罚LogP分子的分布、多峰分子分布以及PubChem中的最大分子。 摘要:Deep generative models of molecules have grown immensely in popularity, trained on relevant datasets, these models are used to search through chemical space. The downstream utility of generative models for the inverse design of novel functional compounds depends on their ability to learn a training distribution of molecules. The most simple example is a language model that takes the form of a recurrent neural network and generates molecules using a string representation. More sophisticated are graph generative models, which sequentially construct molecular graphs and typically achieve state of the art results. However, recent work has shown that language models are more capable than once thought, particularly in the low data regime. In this work, we investigate the capacity of simple language models to learn distributions of molecules. For this purpose, we introduce several challenging generative modeling tasks by compiling especially complex distributions of molecules. On each task, we evaluate the ability of language models as compared with two widely used graph generative models. The results demonstrate that language models are powerful generative models, capable of adeptly learning complex molecular distributions -- and yield better performance than the graph models. Language models can accurately generate: distributions of the highest scoring penalized LogP molecules in ZINC15, multi-modal molecular distributions as well as the largest molecules in PubChem.

【12】 Language Semantics Interpretation with an Interaction-based Recurrent Neural Networks 标题:基于交互的递归神经网络语言语义解释 链接:https://arxiv.org/abs/2112.02997

作者:Shaw-Hwa Lo,Yiqiao Yin 机构:Statistics Department, Columbia University 摘要:文本分类是自然语言处理中的一项基本任务。各种顺序模型能够做出良好的预测,但语言语义和预测结果之间缺乏联系。本文提出了一种新的影响分数(I-score)、一种贪婪搜索算法(BDA)和一种新的特征工程技术(dagger技术)。首先,本文提出了一种新的影响分数(I-score),用于检测和搜索文本文档中的重要语言语义,这有助于在文本分类任务中进行良好的预测。接下来,提出了一种称为向后丢弃算法的贪婪搜索算法来处理数据集中的长期依赖关系。此外,本文还提出了一种新的工程技术,称为“匕首技术”,它充分保留了解释变量和响应变量之间的关系。所提出的技术可以进一步推广到任何前馈人工神经网络(ANN)和卷积神经网络(CNN)以及任何神经网络。在互联网电影数据库(IMDB)上使用了一个真实的应用程序,如果不实现I-score和“匕首技术”,则所提出的方法将提高预测性能,与其他流行同行相比,误差降低81%。 摘要:Text classification is a fundamental language task in Natural Language Processing. A variety of sequential models is capable making good predictions yet there is lack of connection between language semantics and prediction results. This paper proposes a novel influence score (I-score), a greedy search algorithm called Backward Dropping Algorithm (BDA), and a novel feature engineering technique called the "dagger technique". First, the paper proposes a novel influence score (I-score) to detect and search for the important language semantics in text document that are useful for making good prediction in text classification tasks. Next, a greedy search algorithm called the Backward Dropping Algorithm is proposed to handle long-term dependencies in the dataset. Moreover, the paper proposes a novel engineering technique called the "dagger technique" that fully preserve the relationship between explanatory variable and response variable. The proposed techniques can be further generalized into any feed-forward Artificial Neural Networks (ANNs) and Convolutional Neural Networks (CNNs), and any neural network. A real-world application on the Internet Movie Database (IMDB) is used and the proposed methods are applied to improve prediction performance with an 81% error reduction comparing with other popular peers if I-score and "dagger technique" are not implemented.

【13】 Two Wrongs Don't Make a Right: Combating Confirmation Bias in Learning with Label Noise 标题:两错不成对:用标签噪声对抗学习中的确认性偏差 链接:https://arxiv.org/abs/2112.02960

作者:Mingcai Chen,Hao Cheng,Yuntao Du,Ming Xu,Wenyu Jiang,Chongjun Wang 机构:State Key Laboratory for Novel Software Technology at Nanjing University, Nanjing University, Nanjing , China 摘要:噪声标签会损害深度网络的性能。对于稳健学习,一个突出的两阶段管道交替进行,即消除可能的错误标签和半监督训练。但是,丢弃部分观察到的标签可能会导致信息丢失,特别是当损坏不是完全随机的(例如,依赖于类或依赖于实例)时。此外,从一个具有代表性的两阶段方法DivideMix的训练动力学,我们确定了确认偏差的支配地位:伪标签无法纠正大量有噪声的标签,从而导致错误累积。为了充分利用观察到的标签信息并减少错误纠正,我们提出了鲁棒标签翻新(鲁棒LR)——一种新的混合方法,该方法集成了伪标签和置信度估计技术来翻新噪声标签。我们表明,我们的方法成功地减轻了标签噪声和确认偏差的损害。因此,它可以在数据集和噪声类型之间获得最先进的结果。例如,与之前在真实世界的嘈杂数据集WebVision上的最佳精度相比,Robust LR实现了高达4.5%的绝对top-1精度改进。 摘要:Noisy labels damage the performance of deep networks. For robust learning, a prominent two-stage pipeline alternates between eliminating possible incorrect labels and semi-supervised training. However, discarding part of observed labels could result in a loss of information, especially when the corruption is not completely random, e.g., class-dependent or instance-dependent. Moreover, from the training dynamics of a representative two-stage method DivideMix, we identify the domination of confirmation bias: Pseudo-labels fail to correct a considerable amount of noisy labels and consequently, the errors accumulate. To sufficiently exploit information from observed labels and mitigate wrong corrections, we propose Robust Label Refurbishment (Robust LR)-a new hybrid method that integrates pseudo-labeling and confidence estimation techniques to refurbish noisy labels. We show that our method successfully alleviates the damage of both label noise and confirmation bias. As a result, it achieves state-of-the-art results across datasets and noise types. For example, Robust LR achieves up to 4.5% absolute top-1 accuracy improvement over the previous best on the real-world noisy dataset WebVision.

【14】 Automap: Towards Ergonomic Automated Parallelism for ML Models 标题:AUTOMAP:面向ML模型的人机工程自动并行化 链接:https://arxiv.org/abs/2112.02958

作者:Michael Schaarschmidt,Dominik Grewe,Dimitrios Vytiniotis,Adam Paszke,Georg Stefan Schmid,Tamara Norman,James Molloy,Jonathan Godwin,Norman Alexander Rink,Vinod Nair,Dan Belov 机构:Norman A. Rink, DeepMind, Google Research, EPFL 备注:Workshop on ML for Systems at NeurIPS 2021 摘要:对训练大型神经网络结构的需求迅速增加,这使得对划分策略的需求成为焦点,例如通过使用数据、模型或管道并行性。实现这些方法越来越受到程序原语的支持,但识别有效的分区策略需要昂贵的实验和专业知识。我们展示了一个自动分区器的原型,它无缝地集成到现有的编译器和现有的用户工作流中。我们的分区器支持SPMD风格的并行,包括数据并行和参数/激活分片。通过在独立于平台的分区IR中结合归纳策略和搜索,automap可以恢复专家分区策略,如Transformer层的威震天分片。 摘要:The rapid rise in demand for training large neural network architectures has brought into focus the need for partitioning strategies, for example by using data, model, or pipeline parallelism. Implementing these methods is increasingly supported through program primitives, but identifying efficient partitioning strategies requires expensive experimentation and expertise. We present the prototype of an automated partitioner that seamlessly integrates into existing compilers and existing user workflows. Our partitioner enables SPMD-style parallelism that encompasses data parallelism and parameter/activation sharding. Through a combination of inductive tactics and search in a platform-independent partitioning IR, automap can recover expert partitioning strategies such as Megatron sharding for transformer layers.

【15】 Is Class-Incremental Enough for Continual Learning? 标题:班级增量足够持续学习吗? 链接:https://arxiv.org/abs/2112.02925

作者:Andrea Cossu,Gabriele Graffieti,Lorenzo Pellegrini,Davide Maltoni,Davide Bacciu,Antonio Carta,Vincenzo Lomonaco 机构:Pervasive AI Lab, Computer Science Department, University of Pisa, Pisa, Italy, Scuola Normale Superiore, Pisa, Italy, Biometric System & Smart City Lab, Computer Science Department, University of, Bologna, Bologna, Italy, Correspondence: 备注:Under review 摘要:模型持续学习的能力可以在不同的持续学习场景中进行经验评估。每个场景都定义了学习环境的约束和机会。在这里,我们挑战了持续学习文献中的当前趋势,即主要在课堂增量场景中进行实验,在这种场景中,一次体验中出现的课堂永远不会被重温。我们认为,过度关注这一环境可能会限制未来关于持续学习的研究,因为课堂增量场景人为地加剧灾难性遗忘,而牺牲了其他重要目标,如前向迁移和计算效率。事实上,在许多现实环境中,重复先前遇到的概念是自然发生的,有助于缓和先前知识的中断。我们主张对替代性持续学习场景进行更深入的研究,在该场景中,重复通过设计整合到传入信息流中。从已经存在的建议开始,我们描述了此类课程增量和重复场景可以为持续学习模型的更全面评估提供的优势。 摘要:The ability of a model to learn continually can be empirically assessed in different continual learning scenarios. Each scenario defines the constraints and the opportunities of the learning environment. Here, we challenge the current trend in the continual learning literature to experiment mainly on class-incremental scenarios, where classes present in one experience are never revisited. We posit that an excessive focus on this setting may be limiting for future research on continual learning, since class-incremental scenarios artificially exacerbate catastrophic forgetting, at the expense of other important objectives like forward transfer and computational efficiency. In many real-world environments, in fact, repetition of previously encountered concepts occurs naturally and contributes to softening the disruption of previous knowledge. We advocate for a more in-depth study of alternative continual learning scenarios, in which repetition is integrated by design in the stream of incoming information. Starting from already existing proposals, we describe the advantages such class-incremental with repetition scenarios could offer for a more comprehensive assessment of continual learning models.

【16】 A Marketplace for Trading AI Models based on Blockchain and Incentives for IoT Data 标题:基于区块链和物联网数据激励的AI模型交易市场 链接:https://arxiv.org/abs/2112.02870

作者:Lam Duc Nguyen,Shashi Raj Pandey,Soret Beatriz,Arne Broering,Petar Popovski 机构: Aalborg University 备注:14 pages, 9 figures, submitted for publication 摘要:随着机器学习(ML)模型变得越来越复杂,其核心挑战之一是大规模部署,例如公司和组织可以通过人工智能(AI)创造价值。ML中的一个新兴范例是一种联邦方法,其中学习模型部分交付给一组异构代理,允许代理使用自己的数据在本地训练模型。然而,模型估值问题以及数据/模型的合作训练和交易激励问题在文献中得到的处理有限。本文提出了一种基于可信区块链的网络上ML模型交易的新生态系统。买方可以从ML市场获得感兴趣的模型,感兴趣的卖方在其数据上花费本地计算以提高该模型的质量。在此过程中,考虑了本地数据与训练模型质量之间的比例关系,并通过分布数据Shapley值(DSV)估计了卖方在训练模型时的数据估值。同时,分布式账本技术(DLT)提供了整个交易过程的可信度。对提议的方法进行的广泛实验评估表明,运行时性能具有竞争力,执行成本降低15%,并且在激励参与者方面具有公平性。 摘要:As Machine Learning (ML) models are becoming increasingly complex, one of the central challenges is their deployment at scale, such that companies and organizations can create value through Artificial Intelligence (AI). An emerging paradigm in ML is a federated approach where the learning model is delivered to a group of heterogeneous agents partially, allowing agents to train the model locally with their own data. However, the problem of valuation of models, as well the questions of incentives for collaborative training and trading of data/models, have received limited treatment in the literature. In this paper, a new ecosystem of ML model trading over a trusted Blockchain-based network is proposed. The buyer can acquire the model of interest from the ML market, and interested sellers spend local computations on their data to enhance that model's quality. In doing so, the proportional relation between the local data and the quality of trained models is considered, and the valuations of seller's data in training the models are estimated through the distributed Data Shapley Value (DSV). At the same time, the trustworthiness of the entire trading process is provided by the distributed Ledger Technology (DLT). Extensive experimental evaluation of the proposed approach shows a competitive run-time performance, with a 15\% drop in the cost of execution, and fairness in terms of incentives for the participants.

【17】 Learning-based Measurement Scheduling for Loosely-Coupled Cooperative Localization 标题:基于学习的松散耦合协同定位测量调度 链接:https://arxiv.org/abs/2112.02843

作者:Jianan Zhu,Solmaz S. Kia 机构:and, University of California Irvine 备注:6 pages, 7 figures 摘要:在协作定位中,通信移动代理使用代理间的相对度量来改进其基于航位推算的全局定位。度量调度使代理能够决定在其计算资源有限时应处理的可用代理间相对度量的子集。最优测量调度是一个NP难的组合优化问题。所谓的顺序贪婪(SG)算法是该问题的一种流行的次优多项式时间解。然而,SG算法的价值函数评估需要访问所有地标代理(代理可以从其进行测量的队友)的状态估计向量和误差协方差矩阵。本文提出了一种CL测量调度方法,该方法遵循SG方法,但通过使用基于神经网络的代理模型作为SG算法价值函数的代理,降低了通信和计算成本。该模型的意义在于,它由本地信息驱动,并且仅由来自landmark代理的标量元数据驱动。该解决方案通过三种方式解决了运行SG算法的时间和内存复杂性问题:(a)减少代理间通信消息的大小,(b)通过使用更简单的代理(代理)函数降低函数评估的复杂性,(c)减少所需的内存大小。仿真验证了我们的结果。 摘要:In cooperative localization, communicating mobile agents use inter-agent relative measurements to improve their dead-reckoning-based global localization. Measurement scheduling enables an agent to decide which subset of available inter-agent relative measurements it should process when its computational resources are limited. Optimal measurement scheduling is an NP-hard combinatorial optimization problem. The so-called sequential greedy (SG) algorithm is a popular suboptimal polynomial-time solution for this problem. However, the merit function evaluation for the SG algorithms requires access to the state estimate vector and error covariance matrix of all the landmark agents (teammates that an agent can take measurements from). This paper proposes a measurement scheduling for CL that follows the SG approach but reduces the communication and computation cost by using a neural network-based surrogate model as a proxy for the SG algorithm's merit function. The significance of this model is that it is driven by local information and only a scalar metadata from the landmark agents. This solution addresses the time and memory complexity issues of running the SG algorithm in three ways: (a) reducing the inter-agent communication message size, (b) decreasing the complexity of function evaluations by using a simpler surrogate (proxy) function, (c) reducing the required memory size.Simulations demonstrate our results.

【18】 ED2: An Environment Dynamics Decomposition Framework for World Model Construction 标题:ED2:一种用于构建世界模型的环境动力学分解框架 链接:https://arxiv.org/abs/2112.02817

作者:Cong Wang,Tianpei Yang,Jianye Hao,Yan Zheng,Hongyao Tang,Fazl Barez,Jinyi Liu,Jiajie Peng,Haiyin Piao,Zhixiao Sun 机构:College of Intelligence and Computing, Tianjin University, School of Computer Science, Northwestern Polytechnical University 摘要:基于模型的强化学习方法在许多任务中取得了显著的样本效率,但其性能往往受到模型误差的限制。为了减少模型误差,以前的工作使用一个设计良好的网络来拟合整个环境动力学,将环境动力学视为一个黑箱。然而,这些方法缺乏考虑环境分解性质的动态可能包含多个子动力学,可以单独建模,使我们能够更准确地构建世界模型。在本文中,我们提出了环境动力学分解(ED2),一种新的世界模型构建框架,以分解的方式对环境进行建模。ED2包含两个关键组件:子动力学发现(SD2)和动力学分解预测(D2P)。SD2发现环境中的子动力学,然后D2P根据子动力学构造分解的世界模型。ED2可以很容易地与现有的MBRL算法相结合,实证结果表明,ED2显著降低了模型误差,提高了最先进的MBRL算法在各种任务上的性能。 摘要:Model-based reinforcement learning methods achieve significant sample efficiency in many tasks, but their performance is often limited by the existence of the model error. To reduce the model error, previous works use a single well-designed network to fit the entire environment dynamics, which treats the environment dynamics as a black box. However, these methods lack to consider the environmental decomposed property that the dynamics may contain multiple sub-dynamics, which can be modeled separately, allowing us to construct the world model more accurately. In this paper, we propose the Environment Dynamics Decomposition (ED2), a novel world model construction framework that models the environment in a decomposing manner. ED2 contains two key components: sub-dynamics discovery (SD2) and dynamics decomposition prediction (D2P). SD2 discovers the sub-dynamics in an environment and then D2P constructs the decomposed world model following the sub-dynamics. ED2 can be easily combined with existing MBRL algorithms and empirical results show that ED2 significantly reduces the model error and boosts the performance of the state-of-the-art MBRL algorithms on various tasks.

【19】 MDPFuzzer: Finding Crash-Triggering State Sequences in Models Solving the Markov Decision Process 标题:MDPFuzzer:在求解马尔可夫决策过程的模型中寻找崩溃触发状态序列 链接:https://arxiv.org/abs/2112.02807

作者:Qi Pang,Yuanyuan Yuan,Shuai Wang 机构:The Hong Kong University of Science and Technology, Hong Kong SAR 摘要:马尔可夫决策过程(MDP)为顺序决策问题的建模提供了一个数学框架,其中许多问题对安全性和安全性至关重要,如自动驾驶和机器人控制。人工智能研究的快速发展创造了解决MDP的有效方法,如深度神经网络(DNN)、强化学习(RL)和模仿学习(IL)。然而,这些解决MDP的流行模型既没有经过彻底的测试,也没有严格的可靠性。我们介绍了MDPFuzzer,这是第一个用于解决mdp的模型的黑盒模糊测试框架。MDPFuzzer通过检查目标模型是否进入异常和危险状态来形成测试预言。在模糊化过程中,MDPFuzzer通过测量变异状态是否可以减少累积奖励或形成新的状态序列来决定保留哪个变异状态。我们使用高斯混合模型(GMMs)和动态期望最大化(DynEM)设计了有效的技术来量化状态序列的“新鲜度”。我们还通过估计目标模型相对于状态的局部敏感性,优先考虑具有揭示碰撞高可能性的状态。MDPFuzzer在解决MDP的五种最先进的模型上进行评估,包括监督DNN、RL、IL和多代理RL。我们的评估包括自动驾驶、飞机避碰和两个经常用于基准RL的游戏场景。在12小时的运行中,我们发现每个模型上有80多个碰撞触发状态序列。我们展示了鼓舞人心的发现,碰撞触发状态虽然看起来正常,但与正常状态相比,诱发了不同的神经元激活模式。我们进一步开发了一个异常行为检测器,以强化所有评估模型,并使用MDPFuzzer的发现对其进行修复,从而在不牺牲准确性的情况下显著增强其鲁棒性。 摘要:The Markov decision process (MDP) provides a mathematical framework for modeling sequential decision-making problems, many of which are crucial to security and safety, such as autonomous driving and robot control. The rapid development of artificial intelligence research has created efficient methods for solving MDPs, such as deep neural networks (DNNs), reinforcement learning (RL), and imitation learning (IL). However, these popular models for solving MDPs are neither thoroughly tested nor rigorously reliable. We present MDPFuzzer, the first blackbox fuzz testing framework for models solving MDPs. MDPFuzzer forms testing oracles by checking whether the target model enters abnormal and dangerous states. During fuzzing, MDPFuzzer decides which mutated state to retain by measuring if it can reduce cumulative rewards or form a new state sequence. We design efficient techniques to quantify the "freshness" of a state sequence using Gaussian mixture models (GMMs) and dynamic expectation-maximization (DynEM). We also prioritize states with high potential of revealing crashes by estimating the local sensitivity of target models over states. MDPFuzzer is evaluated on five state-of-the-art models for solving MDPs, including supervised DNN, RL, IL, and multi-agent RL. Our evaluation includes scenarios of autonomous driving, aircraft collision avoidance, and two games that are often used to benchmark RL. During a 12-hour run, we find over 80 crash-triggering state sequences on each model. We show inspiring findings that crash-triggering states, though look normal, induce distinct neuron activation patterns compared with normal states. We further develop an abnormal behavior detector to harden all the evaluated models and repair them with the findings of MDPFuzzer to significantly enhance their robustness without sacrificing accuracy.

【20】 A Survey on Deep learning based Document Image Enhancement 标题:基于深度学习的文档图像增强研究综述 链接:https://arxiv.org/abs/2112.02719

作者:Zahra Anvari,Vassilis Athitsos 机构:Department of Computer Science and Engineering, University of Texas Arlington, Arlington, TX 摘要:如今,诸如科学文章、税务表格、发票、合同文件和历史文本等数字化文档被广泛使用。由于各种原因,这些图像可能会降级或损坏,包括拍摄图像时的照明条件差、扫描图像时的阴影、噪声和模糊等失真、老化、墨迹、渗透、水印、印章、,文档图像增强和恢复在许多自动文档分析和识别任务中起着至关重要的作用,例如使用光学字符识别(OCR)进行内容提取。随着深度学习的发展,人们提出了许多方法来提高这些文档图像的质量。在本文中,我们回顾了针对不同文档图像增强问题的基于深度学习的方法、数据集和度量。我们提供了六种不同文档图像增强任务的基于深度学习的方法的全面概述,包括二值化、去模糊、去噪、去噪、去噪、水印移除和阴影移除。我们总结了每项任务的主要最新作品,并讨论了它们的特点、挑战和局限性。我们介绍了多个文档图像增强任务,这些任务受到了相当多的关注,包括曝光过度和曝光不足校正以及穿透式清除,并确定了其他几个有希望的研究方向和未来研究的机会。 摘要:Digitized documents such as scientific articles, tax forms, invoices, contract papers, and historic texts, are widely used nowadays. These images could be degraded or damaged due to various reasons including poor lighting conditions when capturing the image, shadow while scanning them, distortion like noise and blur, aging, ink stain, bleed through, watermark, stamp, etc. Document image enhancement and restoration play a crucial role in many automated document analysis and recognition tasks, such as content extraction using optical character recognition (OCR). With recent advances in deep learning, many methods are proposed to enhance the quality of these document images. In this paper, we review deep learning-based methods, datasets, and metrics for different document image enhancement problems. We provide a comprehensive overview of deep learning-based methods for six different document image enhancement tasks, including binarization, debluring, denoising, defading, watermark removal, and shadow removal. We summarize the main state-of-the-art works for each task and discuss their features, challenges, and limitations. We introduce multiple document image enhancement tasks that have received no to little attention, including over and under exposure correction and bleed-through removal, and identify several other promising research directions and opportunities for future research.

【21】 Learning Swarm Interaction Dynamics from Density Evolution 标题:从密度演化中学习群体相互作用动力学 链接:https://arxiv.org/abs/2112.02675

作者:Christos Mavridis,Amoolya Tirumalai,John Baras 机构:The authors are with the Department of Electrical and Computer Engi-neering and the Institute for Systems Research, University of Maryland 摘要:我们考虑的问题,了解协调运动的生物或人工群。在这方面,我们提出了一种学习方案,通过观察群体的密度随时间的变化来估计相互作用主体的协调规律。我们根据Cucker-Smale群集模型描述了基于成对相互作用的群的动力学,并将群的密度演化表示为平均场流体动力学方程组的解。我们提出了一个新的参数函数族来模拟两两相互作用,这使得积分微分方程的平均场宏观系统可以作为一个增强的偏微分方程系统有效地求解。最后,我们将增广系统合并到一个迭代优化方案中,从观察到的群密度随时间的演化来学习交互代理的动力学。这项工作的结果可以提供另一种方法来研究动物群如何协调,为大型网络系统创建新的控制方案,并作为对抗性无人机攻击防御机制的核心部分。 摘要:We consider the problem of understanding the coordinated movements of biological or artificial swarms. In this regard, we propose a learning scheme to estimate the coordination laws of the interacting agents from observations of the swarm's density over time. We describe the dynamics of the swarm based on pairwise interactions according to a Cucker-Smale flocking model, and express the swarm's density evolution as the solution to a system of mean-field hydrodynamic equations. We propose a new family of parametric functions to model the pairwise interactions, which allows for the mean-field macroscopic system of integro-differential equations to be efficiently solved as an augmented system of PDEs. Finally, we incorporate the augmented system in an iterative optimization scheme to learn the dynamics of the interacting agents from observations of the swarm's density evolution over time. The results of this work can offer an alternative approach to study how animal flocks coordinate, create new control schemes for large networked systems, and serve as a central part of defense mechanisms against adversarial drone attacks.

【22】 Real-time Informative Surgical Skill Assessment with Gaussian Process Learning 标题:基于高斯过程学习的实时信息外科技能评估 链接:https://arxiv.org/abs/2112.02598

作者:Yangming Li,Randall Bly,Sarah Akkina,Rajeev C. Saxena,Ian Humphreys,Mark Whipple,Kris Moe,Blake Hannaford 机构: Institute of Technology, University of Washington 摘要:鼻内窥镜鼻窦和颅底手术(ESSBS)是一种具有挑战性和潜在危险的外科手术,客观的技能评估是提高外科训练有效性、重新验证外科医生技能、降低手术创伤和手术室并发症发生率的关键组成部分。由于外科手术的复杂性、手术方式的多样性以及新的外科技能的快速发展,外科技能评估仍然是一个具有挑战性的问题。本文提出了一种新的基于高斯过程学习的启发式客观手术技能自动评估方法。与经典的手术技能评估算法不同,该方法1)利用手术器械相对运动的运动学特征,而不是使用特定的手术任务或统计数据来实时评估技能;2) 提供信息反馈,而不是总结分数;3) 能够从新数据中增量学习,而不是依赖于固定的数据集。该方法将仪器运动投影到内窥镜坐标系中,以降低数据的维数。然后提取投影数据的运动学特征,并利用高斯过程学习技术学习手术技能水平与特征之间的关系。所提出的方法在完整的内窥镜颅底和鼻窦手术尸体上得到了验证。这些手术有不同的病理学,需要不同的治疗,有不同的复杂性。实验结果表明,该方法对整个手术过程的预测精度达到100%,对实时预测评估的预测精度达到90%。 摘要:Endoscopic Sinus and Skull Base Surgeries (ESSBSs) is a challenging and potentially dangerous surgical procedure, and objective skill assessment is the key components to improve the effectiveness of surgical training, to re-validate surgeons' skills, and to decrease surgical trauma and the complication rate in operating rooms. Because of the complexity of surgical procedures, the variation of operation styles, and the fast development of new surgical skills, the surgical skill assessment remains a challenging problem. This work presents a novel Gaussian Process Learning-based heuristic automatic objective surgical skill assessment method for ESSBSs. Different with classical surgical skill assessment algorithms, the proposed method 1) utilizes the kinematic features in surgical instrument relative movements, instead of using specific surgical tasks or the statistics to assess skills in real-time; 2) provide informative feedback, instead of a summative scores; 3) has the ability to incrementally learn from new data, instead of depending on a fixed dataset. The proposed method projects the instrument movements into the endoscope coordinate to reduce the data dimensionality. It then extracts the kinematic features of the projected data and learns the relationship between surgical skill levels and the features with the Gaussian Process learning technique. The proposed method was verified in full endoscopic skull base and sinus surgeries on cadavers. These surgeries have different pathology, requires different treatment and has different complexities. The experimental results show that the proposed method reaches 100\% prediction precision for complete surgical procedures and 90\% precision for real-time prediction assessment.

【23】 Multiple Interest and Fine Granularity Network for User Modeling 标题:面向用户建模的多兴趣细粒度网络 链接:https://arxiv.org/abs/2112.02591

作者:Jiaxuan Xie,Jianxiong Wei,Qingsong Hua,Yu Zhang 机构:Alibaba Group, Beijing, China 摘要:用户建模在工业推荐系统中起着基础性作用,无论是在匹配阶段还是在排名阶段,在客户体验和业务收入方面都是如此。如何从用户的历史行为序列中有效提取用户的多个兴趣,以提高推荐结果的相关性和个性化仍然是用户建模的一个开放问题。大多数现有的基于深度学习的方法利用项目ID和类别ID,但忽略了颜色和材质等细粒度特征,在本文中,我们提出了多兴趣和细粒度网络(MFN),它解决了用户的多兴趣和细粒度兴趣,并从用户多兴趣之间的相似关系和组合关系来构建模型。具体来说,对于相似关系的建模,我们利用了两组嵌入,其中一组是来自预训练模型(如手套)的固定嵌入,用于给出注意权重,另一组是可训练嵌入,用于与MFN一起训练。对于组合关系的建模,利用自我关注层构建不同兴趣表示的高阶组合。在网络构建中,我们设计了一个兴趣提取模块,利用注意机制从用户的历史行为序列中捕获多个兴趣表示,并利用辅助损失提高兴趣表示的区分度。然后应用层次网络对不同粒度的多个兴趣向量与目标项目之间的注意关系进行建模。我们对公共和工业数据集进行评估。实验结果表明,本文提出的MFN方法比现有的其他表示方法具有更好的性能。 摘要:User modeling plays a fundamental role in industrial recommender systems, either in the matching stage and the ranking stage, in terms of both the customer experience and business revenue. How to extract users' multiple interests effectively from their historical behavior sequences to improve the relevance and personalization of the recommend results remains an open problem for user modeling.Most existing deep-learning based approaches exploit item-ids and category-ids but neglect fine-grained features like color and mate-rial, which hinders modeling the fine granularity of users' interests.In the paper, we present Multiple interest and Fine granularity Net-work (MFN), which tackle users' multiple and fine-grained interests and construct the model from both the similarity relationship and the combination relationship among the users' multiple interests.Specifically, for modeling the similarity relationship, we leverage two sets of embeddings, where one is the fixed embedding from pre-trained models (e.g. Glove) to give the attention weights and the other is trainable embedding to be trained with MFN together.For modeling the combination relationship, self-attentive layers are exploited to build the higher order combinations of different interest representations. In the construction of network, we design an interest-extract module using attention mechanism to capture multiple interest representations from user historical behavior sequences and leverage an auxiliary loss to boost the distinction of the interest representations. Then a hierarchical network is applied to model the attention relation between the multiple interest vectors of different granularities and the target item. We evaluate MFNon both public and industrial datasets. The experimental results demonstrate that the proposed MFN achieves superior performance than other existed representing methods.

【24】 Overcome Anterograde Forgetting with Cycled Memory Networks 标题:用循环记忆网络克服顺行遗忘 链接:https://arxiv.org/abs/2112.02342

作者:Jian Peng,Dingqi Ye,Bo Tang,Yinjie Lei,Yu Liu,Haifeng Li 机构: School of Geosciences and Info-Physics, Central South University, Changsha , Department of Electrical and Computer Engineering, Mississippi State University, Starkville, MS , USA 备注:14 pages, 15 figures 摘要:终生从一系列任务中学习对于智能体走向人工通用智能至关重要。这要求代理不断地学习和记忆新知识而不受干扰。本文首先论证了利用神经网络进行终身学习的一个基本问题,即顺行遗忘,即保存和转移记忆可能会抑制新知识的学习。这归因于这样一个事实,即神经网络的学习能力会随着它对历史知识的记忆而降低,并且当它将不相关的旧知识转移到当前任务时,可能会出现概念混淆。本研究提出了一个称为循环记忆网络(CMN)的通用框架来解决终身学习神经网络中的顺行遗忘问题。CMN由两个独立的内存网络组成,用于存储短期和长期内存,以避免容量缩减。设计了一个转移单元来连接这两个记忆网络,使知识从长期记忆网络转移到短期记忆网络,以减轻概念上的混淆,并开发了一种记忆整合机制,将短期知识整合到长期记忆网络中以积累知识。实验结果表明,CMN能够有效地解决多个任务相关、任务冲突、类增量和跨域基准上的顺行遗忘问题。 摘要:Learning from a sequence of tasks for a lifetime is essential for an agent towards artificial general intelligence. This requires the agent to continuously learn and memorize new knowledge without interference. This paper first demonstrates a fundamental issue of lifelong learning using neural networks, named anterograde forgetting, i.e., preserving and transferring memory may inhibit the learning of new knowledge. This is attributed to the fact that the learning capacity of a neural network will be reduced as it keeps memorizing historical knowledge, and the fact that conceptual confusion may occur as it transfers irrelevant old knowledge to the current task. This work proposes a general framework named Cycled Memory Networks (CMN) to address the anterograde forgetting in neural networks for lifelong learning. The CMN consists of two individual memory networks to store short-term and long-term memories to avoid capacity shrinkage. A transfer cell is designed to connect these two memory networks, enabling knowledge transfer from the long-term memory network to the short-term memory network to mitigate the conceptual confusion, and a memory consolidation mechanism is developed to integrate short-term knowledge into the long-term memory network for knowledge accumulation. Experimental results demonstrate that the CMN can effectively address the anterograde forgetting on several task-related, task-conflict, class-incremental and cross-domain benchmarks.

【25】 Towards the One Learning Algorithm Hypothesis: A System-theoretic Approach 标题:走向一种学习算法假设:一种系统论方法 链接:https://arxiv.org/abs/2112.02256

作者:Christos Mavridis,John Baras 机构: some of the mostimpactful breakthroughs in machine learning research haveThe authors are with the Department of Electrical and Computer Engineer-ing and the Institute for Systems Research, University of Maryland 备注:arXiv admin note: text overlap with arXiv:2102.05836 摘要:人类认知中普遍存在的学习结构是一个广泛流传的猜想,这一猜想得到了神经科学实验结果的支持。虽然还没有低层次的实现方法,但人们认为人类感知和学习的抽象轮廓包含三个基本属性:(a)层次注意和处理,(b)基于记忆的知识表示,以及(c)渐进学习和知识压缩。我们从系统理论的角度来设计这样一个学习体系结构,开发了一个包含三个主要组件的闭环系统:(i)多分辨率分析预处理器,(ii)组不变特征提取器,(iii)基于知识的渐进式学习模块。多分辨率反馈回路用于学习,即使系统参数适应在线观测。为了设计(i)和(ii),我们基于已建立的基于小波的多分辨率分析理论和群卷积算子的性质。关于(iii),我们介绍了一种新的学习算法,该算法以多分辨率构造逐步增长的知识表示。该算法是基于退火优化的在线确定性退火(ODA)算法的扩展,采用无梯度随机近似求解。ODA具有固有的鲁棒性和正则化特性,并提供了一种通过直观的分岔现象逐步增加学习模型复杂性的方法,即根据需要增加神经元的数量。提出的多分辨率方法具有层次性、渐进性、基于知识和可解释性。我们在最先进的学习算法和深度学习方法的背景下说明了所提出的体系结构的特性。 摘要:The existence of a universal learning architecture in human cognition is a widely spread conjecture supported by experimental findings from neuroscience. While no low-level implementation can be specified yet, an abstract outline of human perception and learning is believed to entail three basic properties: (a) hierarchical attention and processing, (b) memory-based knowledge representation, and (c) progressive learning and knowledge compaction. We approach the design of such a learning architecture from a system-theoretic viewpoint, developing a closed-loop system with three main components: (i) a multi-resolution analysis pre-processor, (ii) a group-invariant feature extractor, and (iii) a progressive knowledge-based learning module. Multi-resolution feedback loops are used for learning, i.e., for adapting the system parameters to online observations. To design (i) and (ii), we build upon the established theory of wavelet-based multi-resolution analysis and the properties of group convolution operators. Regarding (iii), we introduce a novel learning algorithm that constructs progressively growing knowledge representations in multiple resolutions. The proposed algorithm is an extension of the Online Deterministic Annealing (ODA) algorithm based on annealing optimization, solved using gradient-free stochastic approximation. ODA has inherent robustness and regularization properties and provides a means to progressively increase the complexity of the learning model i.e. the number of the neurons, as needed, through an intuitive bifurcation phenomenon. The proposed multi-resolution approach is hierarchical, progressive, knowledge-based, and interpretable. We illustrate the properties of the proposed architecture in the context of the state-of-the-art learning algorithms and deep learning methods.

【26】 SHAPr: An Efficient and Versatile Membership Privacy Risk Metric for Machine Learning 标题:Shapr:一种高效通用的机器学习成员隐私风险度量 链接:https://arxiv.org/abs/2112.02230

作者:Vasisht Duddu,Sebastian Szyller,N. Asokan 机构:University of Waterloo, Waterloo, Canada, Aalto University, Espoo, Finland 摘要:用于训练机器学习(ML)模型的数据可能是敏感的。成员身份推断攻击(MIAs),试图确定某个特定数据记录是否用于训练ML模型,可能会侵犯成员身份隐私。ML模型构建者需要一个原则性的度量定义,使他们能够量化(a)个人训练数据记录,(b)独立于特定MIA,(c)有效的隐私风险。先前关于会员隐私风险度量的工作没有一项同时满足所有这些标准。我们提出了这样一个度量,SHAPr,它使用Shapley值通过测量模型对模型效用的影响来量化模型对单个训练数据记录的记忆。这种记忆是衡量成功的可能性。使用十个基准数据集,我们表明SHAPr在估计MIA训练数据记录的易感性方面是有效的(精度:0.94$pm 0.06$,召回率:0.88$pm 0.06$),并且是有效的(对于较小的数据集可在几分钟内计算,对于最大的数据集可在90分钟内计算)。SHAPr的用途也很广泛,因为它可以用于其他目的,如评估公平性或为数据集的子集指定估值。例如,我们表明SHAPr正确地捕获了不同子组对MIA的不成比例的脆弱性。使用SHAPr,我们表明,通过删除高风险训练数据记录,数据集的成员隐私风险不一定会得到改善,从而在显著扩展的环境中(在10个数据集中,删除高达50%的数据)确认先前工作的观察结果。 摘要:Data used to train machine learning (ML) models can be sensitive. Membership inference attacks (MIAs), attempting to determine whether a particular data record was used to train an ML model, risk violating membership privacy. ML model builders need a principled definition of a metric that enables them to quantify the privacy risk of (a) individual training data records, (b) independently of specific MIAs, (c) efficiently. None of the prior work on membership privacy risk metrics simultaneously meets all of these criteria. We propose such a metric, SHAPr, which uses Shapley values to quantify a model's memorization of an individual training data record by measuring its influence on the model's utility. This memorization is a measure of the likelihood of a successful MIA. Using ten benchmark datasets, we show that SHAPr is effective (precision: 0.94$pm 0.06$, recall: 0.88$pm 0.06$) in estimating susceptibility of a training data record for MIAs, and is efficient (computable within minutes for smaller datasets and in ~90 minutes for the largest dataset). SHAPr is also versatile in that it can be used for other purposes like assessing fairness or assigning valuation for subsets of a dataset. For example, we show that SHAPr correctly captures the disproportionate vulnerability of different subgroups to MIAs. Using SHAPr, we show that the membership privacy risk of a dataset is not necessarily improved by removing high risk training data records, thereby confirming an observation from prior work in a significantly extended setting (in ten datasets, removing up to 50% of data).

【27】 A Methodology for Thermal Simulation of Interconnects Enabled by Model Reduction with Material Property Variation 标题:基于材料特性变化模型降阶的互连线热模拟方法 链接:https://arxiv.org/abs/2112.03023

作者:Wangkun Jia,Ming-C. Cheng 机构:Department of Electrical and Computer Engineering, Clarkson University, Potsdam, NY ,- 备注:23 pages, 15 figures 摘要:提出了一种基于数据驱动学习算法的互连热模拟方法,该算法考虑了材料特性、热源和边界条件(BCs)的变化。该方法基于模型降阶和区域分解的概念来构造多块方法。建立了一个通用块模型来表示一组互连块,这些互连块用于连接集成电路(IC)中的标准单元。该组中的块具有与各种金属/通孔布线相同的几何形状。因此,除了热源和BCs的变化外,数据驱动的模型简化方法还可用于学习块中不同金属/通孔布线引起的材料性能变化。该方法在两种截然不同的环境下进行研究。首次将其应用于单个互连块的热模拟,其BCs与通用块训练中的BCs相似。然后在FinFET IC的多块热模拟中实现,其中互连结构被划分为几个块,每个块由通用块模型建模。根据金属/通孔布线、BCs和块接口处的热不连续性检查通用块模型的准确性。 摘要:A thermal simulation methodology is developed for interconnects enabled by a data-driven learning algorithm accounting for variations of material properties, heat sources and boundary conditions (BCs). The methodology is based on the concepts of model order reduction and domain decomposition to construct a multi-block approach. A generic block model is built to represent a group of interconnect blocks that are used to wire standard cells in the integrated circuits (ICs). The blocks in this group possess identical geometry with various metal/via routings. The data-driven model reduction method is thus applied to learn material property variations induced by different metal/via routings in the blocks, in addition to the variations of heat sources and BCs. The approach is investigated in two very different settings. It is first applied to thermal simulation of a single interconnect block with similar BCs to those in the training of the generic block. It is then implemented in multi-block thermal simulation of a FinFET IC, where the interconnect structure is partitioned into several blocks each modeled by the generic block model. Accuracy of the generic block model is examined in terms of the metal/via routings, BCs and thermal discontinuities at the block interfaces.

【28】 Local Adaptivity of Gradient Boosting in Histogram Transform Ensemble Learning 标题:直方图变换集成学习中梯度增强的局部自适应性 链接:https://arxiv.org/abs/2112.02589

作者:Hanyuan Hang 机构:Department of Applied Mathematics, University of Twente, The Netherlands 摘要:在本文中,我们提出了一种用于回归的梯度增强算法,称为 extit{adaptive boosting histogram transform}( extit{ABHT}),以说明梯度增强算法在直方图变换集成学习中的局部适应性。从理论上看,当目标函数位于局部H“older连续空间中时,我们证明了我们的ABHT可以过滤出具有不同光滑度的区域。因此,我们能够证明ABHT的收敛速度的上界严格小于 ext的下界{并行集合直方图变换}( extit{PEHT})。在实验中,合成数据和真实数据实验都验证了理论结果,这表明我们的ABHT具有优越的性能和局部自适应性。 摘要:In this paper, we propose a gradient boosting algorithm called extit{adaptive boosting histogram transform} ( extit{ABHT}) for regression to illustrate the local adaptivity of gradient boosting algorithms in histogram transform ensemble learning. From the theoretical perspective, when the target function lies in a locally H"older continuous space, we show that our ABHT can filter out the regions with different orders of smoothness. Consequently, we are able to prove that the upper bound of the convergence rates of ABHT is strictly smaller than the lower bound of extit{parallel ensemble histogram transform} ( extit{PEHT}). In the experiments, both synthetic and real-world data experiments empirically validate the theoretical results, which demonstrates the advantageous performance and local adaptivity of our ABHT.

【29】 Artificial Intelligence and Machine Learning in Nuclear Physics 标题:人工智能与核物理中的机器学习 链接:https://arxiv.org/abs/2112.02309

作者:Amber Boehnlein,Markus Diefenthaler,Cristiano Fanelli,Morten Hjorth-Jensen,Tanja Horn,Michelle P. Kuchera,Dean Lee,Witold Nazarewicz,Kostas Orginos,Peter Ostroumov,Long-Gang Pang,Alan Poon,Nobuo Sato,Malachi Schram,Alexander Scheinker,Michael S. Smith,Xin-Nian Wang,Veronique Ziegler 机构:Thomas Jefferson National Accelerator Facility, Jefferson Avenue, Newport News, Virginia, USA, Laboratory for Nuclear Science, Massachusetts Institute of Technology, Cambridge, MA , The NSF AI Institute for Artificial Intelligence and Fundamental Interactions 备注:Comments are welcome 摘要:人工智能/机器学习方法的进步提供了在科学研究中具有广泛适用性的工具。这些技术正在核物理研究主题的多样性中得到应用,从而促进科学发现和社会应用。本文综述了人工智能和机器学习技术对核物理研究的影响。 摘要:Advances in artificial intelligence/machine learning methods provide tools that have broad applicability in scientific research. These techniques are being applied across the diversity of nuclear physics research topics, leading to advances that will facilitate scientific discoveries and societal applications. This Review gives a snapshot of nuclear physics research which has been transformed by artificial intelligence and machine learning techniques.

【30】 Learning to Search in Local Branching 标题:在局部分支中学习搜索 链接:https://arxiv.org/abs/2112.02195

作者:Defeng Liu,Matteo Fischetti,Andrea Lodi 机构:Canada Excellence Research Chair, Polytechnique Montr´eal, Department of Information Engineering, University of Padova, Jacobs Technion-Cornell Institute, Cornell University 摘要:寻找混合整数线性规划问题的高质量解对于许多实际应用具有重要意义。在这方面,提出了细化启发式局部分支(LB)来产生改进的解,并对MILP中局部搜索方法的发展产生了很大的影响。该算法迭代地探索由所谓的局部分支约束定义的解邻域序列,即限制与参考解距离的线性不等式。对于LB算法,邻域大小的选择对性能至关重要。虽然在原始LB方案中它是由保守值初始化的,但我们的新观察结果是,最佳大小强烈依赖于特定的MILP实例。在这项工作中,我们研究了搜索邻域的大小与底层LB算法行为之间的关系,并设计了一个基于学习的框架来指导LB启发式算法的邻域搜索。该框架包括两个阶段的战略。对于第一阶段,通过一个回归任务,训练一个比例回归模型,在第一次迭代时预测LB邻域的大小。在第二阶段,我们利用强化学习,设计一种强化邻域搜索策略,在后续迭代中动态调整大小。我们的计算表明,邻域大小确实是可以学习的,从而提高了性能,并且整个算法在实例大小方面以及在实例之间都具有良好的通用性。 摘要:Finding high-quality solutions to mixed-integer linear programming problems (MILPs) is of great importance for many practical applications. In this respect, the refinement heuristic local branching (LB) has been proposed to produce improving solutions and has been highly influential for the development of local search methods in MILP. The algorithm iteratively explores a sequence of solution neighborhoods defined by the so-called local branching constraint, namely, a linear inequality limiting the distance from a reference solution. For a LB algorithm, the choice of the neighborhood size is critical to performance. Although it was initialized by a conservative value in the original LB scheme, our new observation is that the best size is strongly dependent on the particular MILP instance. In this work, we investigate the relation between the size of the search neighborhood and the behavior of the underlying LB algorithm, and we devise a leaning based framework for guiding the neighborhood search of the LB heuristic. The framework consists of a two-phase strategy. For the first phase, a scaled regression model is trained to predict the size of the LB neighborhood at the first iteration through a regression task. In the second phase, we leverage reinforcement learning and devise a reinforced neighborhood search strategy to dynamically adapt the size at the subsequent iterations. We computationally show that the neighborhood size can indeed be learned, leading to improved performances and that the overall algorithm generalizes well both with respect to the instance size and, remarkably, across instances.

其他(39篇)

【1】 Human Parity on CommonsenseQA: Augmenting Self-Attention with External Attention 标题:常识问答中的人类平等观:用外部关注增强自我关注 链接:https://arxiv.org/abs/2112.03254

作者:Yichong Xu,Chenguang Zhu,Shuohang Wang,Siqi Sun,Hao Cheng,Xiaodong Liu,Jianfeng Gao,Pengcheng He,Michael Zeng,Xuedong Huang 机构:Microsoft Corporation 备注:11 pages, 1 figure, 7 tables 摘要:当今大多数人工智能系统都专注于在大量不同数据上使用自我关注机制和转换器架构,以获得令人印象深刻的性能提升。在本文中,我们建议使用外部注意机制来增强transformer体系结构,以带来外部知识和上下文。通过将外部信息集成到预测过程中,我们希望减少对更大模型的需求,并提高人工智能系统的民主化程度。我们发现,所提出的外部注意机制可以显著改善现有的人工智能系统的性能,允许从业者容易地将基础AI模型定制到许多不同的下游应用。特别是,我们关注于常识推理的任务,证明所提出的外部注意机制可以增强现有的Transformer模型,并显著提高模型的推理能力。所提议的系统,知识外部注意推理(KEAR),在开放常识QA研究基准上达到人类平等,准确率为89.4\%,而人类准确率为88.9\%。 摘要:Most of today's AI systems focus on using self-attention mechanisms and transformer architectures on large amounts of diverse data to achieve impressive performance gains. In this paper, we propose to augment the transformer architecture with an external attention mechanism to bring external knowledge and context to bear. By integrating external information into the prediction process, we hope to reduce the need for ever-larger models and increase the democratization of AI systems. We find that the proposed external attention mechanism can significantly improve the performance of existing AI systems, allowing practitioners to easily customize foundation AI models to many diverse downstream applications. In particular, we focus on the task of Commonsense Reasoning, demonstrating that the proposed external attention mechanism can augment existing transformer models and significantly improve the model's reasoning capabilities. The proposed system, Knowledge External Attention for Reasoning (KEAR), reaches human parity on the open CommonsenseQA research benchmark with an accuracy of 89.4\% in comparison to the human accuracy of 88.9\%.

【2】 On Complexity of 1-Center in Various Metrics 标题:关于不同度量下1-中心的复杂性 链接:https://arxiv.org/abs/2112.03222

作者:Amir Abboud,MohammadHossein Bateni,Vincent Cohen-Addad,Karthik C. S.,Saeed Seddighin 机构:Weizmann Institute of Science, Google Research, Rutgers University, Toyota Technological Institute at Chicago 摘要:我们考虑经典的1-中心问题:给定度量空间中的n个点的集合p,找到P中的最大点到P的其他点的点。我们研究了D维$ ELL P $度量中的这个问题的复杂性,以及在长度D的字符串上的编辑和乌拉姆度量。我们对单中心问题的结果可以根据d分类如下$ullet$Small d:我们提供了固定维$ellu 1$度量中的第一个单中心问题的线性时间算法。另一方面,假设命中集猜想(HSC),我们证明当$d=omega(log n)$)时,在任何$ellu p$-度量中,或在编辑或Ulam度量中,都没有次二次算法可以解决单中心问题$子弹$d。当$d=Omega(n)$时,我们扩展了我们的条件下界,以排除编辑度量中1-中心问题的次四次算法(假设量化SETH)。另一方面,我们给出了运行时间为$ ilde{O{epsilon}(nd+n^2sqrt{d})的Ulam度量中1-中心的$(1+epsilon)$近似。我们还通过允许近似或降低维数d来加强上述一些下界,但仅针对列出所有必需解的较弱算法类。此外,我们扩展了我们的一个硬度结果,以排除编辑度量中经过充分研究的1-中值问题的次四次算法,其中给定一组长度为n的n个字符串,目标是在该集中找到一个字符串,该字符串使编辑距离与该集中其余字符串之和最小。 摘要:We consider the classic 1-center problem: Given a set P of n points in a metric space find the point in P that minimizes the maximum distance to the other points of P. We study the complexity of this problem in d-dimensional $ell_p$-metrics and in edit and Ulam metrics over strings of length d. Our results for the 1-center problem may be classified based on d as follows. $ullet$ Small d: We provide the first linear-time algorithm for 1-center problem in fixed-dimensional $ell_1$ metrics. On the other hand, assuming the hitting set conjecture (HSC), we show that when $d=omega(log n)$, no subquadratic algorithm can solve 1-center problem in any of the $ell_p$-metrics, or in edit or Ulam metrics. $ullet$ Large d. When $d=Omega(n)$, we extend our conditional lower bound to rule out sub quartic algorithms for 1-center problem in edit metric (assuming Quantified SETH). On the other hand, we give a $(1+epsilon)$-approximation for 1-center in Ulam metric with running time $ ilde{O_{epsilon}}(nd+n^2sqrt{d})$. We also strengthen some of the above lower bounds by allowing approximations or by reducing the dimension d, but only against a weaker class of algorithms which list all requisite solutions. Moreover, we extend one of our hardness results to rule out subquartic algorithms for the well-studied 1-median problem in the edit metric, where given a set of n strings each of length n, the goal is to find a string in the set that minimizes the sum of the edit distances to the rest of the strings in the set.

【3】 Contextual Bandit Applications in Customer Support Bot 标题:客户支持Bot中的上下文Bandit应用程序 链接:https://arxiv.org/abs/2112.03210

作者:Sandra Sajeev,Jade Huang,Nikos Karampatziakis,Matthew Hall,Sebastian Kochman,Weizhu Chen 机构:Microsoft Azure AI 备注:None 摘要:虚拟支持代理作为企业提供更好、更易访问的客户服务的一种方式越来越受欢迎。该领域的一些挑战包括含糊不清的用户查询以及不断变化的支持主题和用户行为(非平稳性)。但是,我们可以访问用户提供的部分反馈(点击、调查和其他事件),这些反馈可以用来改善用户体验。适应性学习技术,如上下文强盗,自然适合这种问题设置。在本文中,我们将讨论Microsoft虚拟代理的上下文bandits(CB)的实际实现。它包括基于神经线性土匪(NLB)的意图消歧和基于多武装土匪(MAB)集合的上下文建议。我们的解决方案已部署到生产中,并改进了Microsoft virtual agent的关键业务指标,A/B实验证实了这一点。结果包括问题解决率相对提高12%以上,向人工操作员升级的相对降低4%以上。虽然我们当前的用例关注于意图消除歧义和上下文推荐以支持机器人,但我们相信我们的方法可以扩展到其他领域。 摘要:Virtual support agents have grown in popularity as a way for businesses to provide better and more accessible customer service. Some challenges in this domain include ambiguous user queries as well as changing support topics and user behavior (non-stationarity). We do, however, have access to partial feedback provided by the user (clicks, surveys, and other events) which can be leveraged to improve the user experience. Adaptable learning techniques, like contextual bandits, are a natural fit for this problem setting. In this paper, we discuss real-world implementations of contextual bandits (CB) for the Microsoft virtual agent. It includes intent disambiguation based on neural-linear bandits (NLB) and contextual recommendations based on a collection of multi-armed bandits (MAB). Our solutions have been deployed to production and have improved key business metrics of the Microsoft virtual agent, as confirmed by A/B experiments. Results include a relative increase of over 12% in problem resolution rate and relative decrease of over 4% in escalations to a human operator. While our current use cases focus on intent disambiguation and contextual recommendation for support bots, we believe our methods can be extended to other domains.

【4】 Anchoring to Exemplars for Training Mixture-of-Expert Cell Embeddings 标题:锚定到样本以训练混合专家单元嵌入 链接:https://arxiv.org/abs/2112.03208

作者:Siqi Wang,Manyuan Lu,Nikita Moshkov,Juan C. Caicedo,Bryan A. Plummer 机构:Boston University, Biological Research Centre, Broad Institute of MIT and Harvard 摘要:在显微镜图像中分析细胞的形态可以深入了解化合物的机理或基因的功能。解决这一任务需要的方法不仅可以从图像中提取生物信息,而且可以忽略技术变化,即实验程序的变化或用于收集显微镜图像的设备之间的差异。我们提出了混合专家(团队)的治疗范例,这是一种嵌入学习方法,它学习一组专家,这些专家专门捕捉我们训练集中的技术变化,然后在测试时聚合专家的预测。因此,团队可以通过最小化每个专家发出的噪音,以较少的技术偏差学习强大的嵌入。为了训练我们的模型,我们利用处理示例,使我们的方法能够在每个小批量中捕获整个数据集的分布,同时仍然适合GPU内存。我们在三个数据集上对我们的方法进行了评估,这些数据集用于药物发现等任务,与最新技术相比,在确定细胞治疗的真正作用机制方面提高了5.5-11%。 摘要:Analyzing the morphology of cells in microscopy images can provide insights into the mechanism of compounds or the function of genes. Addressing this task requires methods that can not only extract biological information from the images, but also ignore technical variations, ie, changes in experimental procedure or differences between equipments used to collect microscopy images. We propose Treatment ExemplArs with Mixture-of-experts (TEAMs), an embedding learning approach that learns a set of experts that are specialized in capturing technical variations in our training set and then aggregates specialist's predictions at test time. Thus, TEAMs can learn powerful embeddings with less technical variation bias by minimizing the noise from every expert. To train our model, we leverage Treatment Exemplars that enable our approach to capture the distribution of the entire dataset in every minibatch while still fitting into GPU memory. We evaluate our approach on three datasets for tasks like drug discovery, boosting performance on identifying the true mechanism of action of cell treatments by 5.5-11% over the state-of-the-art.

【5】 Player of Games 标题:最佳游戏选手 链接:https://arxiv.org/abs/2112.03178

作者:Martin Schmid,Matej Moravcik,Neil Burch,Rudolf Kadlec,Josh Davidson,Kevin Waugh,Nolan Bard,Finbarr Timbers,Marc Lanctot,Zach Holland,Elnaz Davoodi,Alden Christianson,Michael Bowling 机构:Matej Moravˇc´ık, DeepMind∗ 摘要:游戏作为人工智能进步的基准由来已久。最近,使用搜索和学习的方法在一组完美信息博弈中表现出强大的性能,而使用博弈论推理和学习的方法在特定的不完美信息扑克变体中表现出强大的性能。我们介绍了游戏玩家,这是一种通用算法,它结合了引导搜索、自我游戏学习和博弈论推理,统一了以前的方法。Player of Games是第一个在大型完美和不完美信息博弈中获得强大经验性能的算法,这是迈向真正适用于任意环境的通用算法的重要一步。我们证明了游戏者是健全的,随着可用计算时间和近似容量的增加,会收敛到完美游戏者。游戏玩家在国际象棋和围棋中表现出色,击败了最强大的公开代理——德克萨斯州无限制扑克(Slumbot),并击败了苏格兰场最先进的代理——一款不完美的信息游戏,它展示了引导搜索、学习和博弈论推理的价值。 摘要:Games have a long history of serving as a benchmark for progress in artificial intelligence. Recently, approaches using search and learning have shown strong performance across a set of perfect information games, and approaches using game-theoretic reasoning and learning have shown strong performance for specific imperfect information poker variants. We introduce Player of Games, a general-purpose algorithm that unifies previous approaches, combining guided search, self-play learning, and game-theoretic reasoning. Player of Games is the first algorithm to achieve strong empirical performance in large perfect and imperfect information games -- an important step towards truly general algorithms for arbitrary environments. We prove that Player of Games is sound, converging to perfect play as available computation time and approximation capacity increases. Player of Games reaches strong performance in chess and Go, beats the strongest openly available agent in heads-up no-limit Texas hold'em poker (Slumbot), and defeats the state-of-the-art agent in Scotland Yard, an imperfect information game that illustrates the value of guided search, learning, and game-theoretic reasoning.

【6】 Feature Disentanglement of Robot Trajectories 标题:机器人轨迹的特征解缠 链接:https://arxiv.org/abs/2112.03164

作者:Matias Valdenegro-Toro,Daniel Harnack,Hendrik Wöhrle 机构: German Research Center for Artificial Intelligence, Bremen, Germany. 备注:5 pages, 3 figures, 1 table, with supplementary 摘要:由机器人关节生成的轨迹建模是复杂的,并且对于轨迹生成、聚类和分类等高级活动是必需的。无障碍表征学习有望在无监督学习方面取得进展,但尚未在机器人生成的轨迹中进行评估。在本文中,我们在一个由3自由度机械臂生成的1M机器人轨迹数据集上评估了三个分离变量($eta$-VAE、Decorr-VAE和一个新的$eta$-Decorr-VAE)。我们发现,基于去相关的公式在解纠缠度量、轨迹质量以及与地面真相潜在特征的相关性方面表现最好。我们期望这些结果能增加无监督学习在机器人控制中的应用。 摘要:Modeling trajectories generated by robot joints is complex and required for high level activities like trajectory generation, clustering, and classification. Disentagled representation learning promises advances in unsupervised learning, but they have not been evaluated in robot-generated trajectories. In this paper we evaluate three disentangling VAEs ($eta$-VAE, Decorr VAE, and a new $eta$-Decorr VAE) on a dataset of 1M robot trajectories generated from a 3 DoF robot arm. We find that the decorrelation-based formulations perform the best in terms of disentangling metrics, trajectory quality, and correlation with ground truth latent features. We expect that these results increase the use of unsupervised learning in robot control.

【7】 Distilled Domain Randomization 标题:蒸馏域随机化 链接:https://arxiv.org/abs/2112.03149

作者:Julien Brosseit,Benedikt Hahner,Fabio Muratore,Michael Gienger,Jan Peters 机构:de 2Fabio Muratore and Michael Gienger are with the Honda ResearchInstitute Europe 备注:shared first authorship between Julien Brosseit, Benedikt Hahner, and Fabio Muratore 摘要:深度强化学习是从零开始学习机器人控制策略的有效工具。然而,这些方法因所需的大量训练数据而臭名昭著,而在真实机器人上收集这些数据的成本高得令人望而却步。一种非常流行的替代方法是从模拟中学习,从而更快、更安全、更便宜地生成数据。由于所有模拟器都只是现实的模型,模拟数据和真实数据之间不可避免地存在差异,通常被称为“现实差距”。为了弥补这一差距,许多方法通过模拟器从分布中学习一个策略。在本文中,我们建议将随机物理模拟中的强化学习与策略提取相结合。我们的算法称为“提取域随机化”(DiDoR),它将所谓的教师策略提取到学生策略中,教师策略是最初对域进行采样的专家,然后部署到学生策略中。通过这种方式,DiDoR学习直接从模拟转移到现实的控制器,即不需要目标域的数据。我们在三个sim-to-sim以及两个sim-to-real实验中将DiDoR与三个基线进行了比较。我们的研究结果表明,用DIDOR训练的策略的目标域性能与基线相比是一致的或更好的。此外,我们的方法既不增加所需的内存容量,也不增加计算动作的时间,这很可能是成功部署学习控制器的故障点。 摘要:Deep reinforcement learning is an effective tool to learn robot control policies from scratch. However, these methods are notorious for the enormous amount of required training data which is prohibitively expensive to collect on real robots. A highly popular alternative is to learn from simulations, allowing to generate the data much faster, safer, and cheaper. Since all simulators are mere models of reality, there are inevitable differences between the simulated and the real data, often referenced as the 'reality gap'. To bridge this gap, many approaches learn one policy from a distribution over simulators. In this paper, we propose to combine reinforcement learning from randomized physics simulations with policy distillation. Our algorithm, called Distilled Domain Randomization (DiDoR), distills so-called teacher policies, which are experts on domains that have been sampled initially, into a student policy that is later deployed. This way, DiDoR learns controllers which transfer directly from simulation to reality, i.e., without requiring data from the target domain. We compare DiDoR against three baselines in three sim-to-sim as well as two sim-to-real experiments. Our results show that the target domain performance of policies trained with DiDoR is en par or better than the baselines'. Moreover, our approach neither increases the required memory capacity nor the time to compute an action, which may well be a point of failure for successfully deploying the learned controller.

【8】 Properties of Minimizing Entropy 标题:熵最小化的性质 链接:https://arxiv.org/abs/2112.03143

作者:Xu Ji,Lena Nehale-Ezzine,Maksym Korablyov 机构:Mila, Quebec AI Institute 摘要:紧凑的数据表示是改进学习函数泛化的一种方法。我们明确说明了熵和基数之间的关系,这两种紧性度量,包括前者的梯度下降如何减少后者。熵是分布敏感的,基数则不是。我们提出了第三个紧性度量,它是两个度量之间的折衷:期望基数,或在任何有限个抽取中的唯一状态的期望数量,这比标准基数更有意义,因为它以可忽略的概率质量折扣状态。我们证明了最小化熵也最小化期望基数。 摘要:Compact data representations are one approach for improving generalization of learned functions. We explicitly illustrate the relationship between entropy and cardinality, both measures of compactness, including how gradient descent on the former reduces the latter. Whereas entropy is distribution sensitive, cardinality is not. We propose a third compactness measure that is a compromise between the two: expected cardinality, or the expected number of unique states in any finite number of draws, which is more meaningful than standard cardinality as it discounts states with negligible probability mass. We show that minimizing entropy also minimizes expected cardinality.

【9】 Ethics and Creativity in Computer Vision 标题:计算机视觉中的伦理与创新 链接:https://arxiv.org/abs/2112.03111

作者:Negar Rostamzadeh,Emily Denton,Linda Petrini 机构:Google Research, Montreal, New York 备注:None 摘要:本文提供了一个回顾我们从组织工作坊*在计算机视觉的创造性应用* CVPR 2021会议的伦理考虑,并在此之前,在计算机视觉的时装,艺术和设计的一系列研讨会*在ECRV 2018,ICCV 2019,和CVPR 2020。我们希望这一反思将使艺术家和机器学习研究人员围绕计算机视觉创造性应用的伦理和社会层面展开对话。 摘要:This paper offers a retrospective of what we learnt from organizing the workshop *Ethical Considerations in Creative applications of Computer Vision* at CVPR 2021 conference and, prior to that, a series of workshops on *Computer Vision for Fashion, Art and Design* at ECCV 2018, ICCV 2019, and CVPR 2020. We hope this reflection will bring artists and machine learning researchers into conversation around the ethical and social dimensions of creative applications of computer vision.

【10】 HTMOT : Hierarchical Topic Modelling Over Time 标题:HTMOT:随时间推移的分层主题建模 链接:https://arxiv.org/abs/2112.03104

作者:Judicael Poumay,Ashwin Ittoo 机构:ULiegeHEC Liege Rue louvrex , Liege, Belgium 摘要:多年来,主题模型提供了一种从文本中提取见解的有效方法。然而,尽管已经提出了许多模型,但没有一个能够对主题的时间性和层次性进行联合建模。建模时间通过分离词汇上相近但时间上不同的主题提供了更精确的主题,而建模层次结构提供了文档语料库内容的更详细视图。因此,在本研究中,我们提出了一种新的方法,HTMOT,来执行随时间推移的分层主题建模。我们使用一种新的Gibbs采样实现来训练HTMOT,这更有效。具体来说,我们表明,只有将时间建模应用于深层次主题,才能提取特定的故事或事件,而高层主题则提取语料库中较大的主题。我们的结果表明,我们的训练过程是快速的,可以提取准确的高级主题和时间精确的子主题。我们使用单词入侵任务测量了我们模型的性能,并概述了这种评估方法的一些局限性,特别是对于层次模型。作为一个案例研究,我们重点介绍了2020年航天工业的各种发展情况。 摘要:Over the years, topic models have provided an efficient way of extracting insights from text. However, while many models have been proposed, none are able to model topic temporality and hierarchy jointly. Modelling time provide more precise topics by separating lexically close but temporally distinct topics while modelling hierarchy provides a more detailed view of the content of a document corpus. In this study, we therefore propose a novel method, HTMOT, to perform Hierarchical Topic Modelling Over Time. We train HTMOT using a new implementation of Gibbs sampling, which is more efficient. Specifically, we show that only applying time modelling to deep sub-topics provides a way to extract specific stories or events while high level topics extract larger themes in the corpus. Our results show that our training procedure is fast and can extract accurate high-level topics and temporally precise sub-topics. We measured our model's performance using the Word Intrusion task and outlined some limitations of this evaluation method, especially for hierarchical models. As a case study, we focused on the various developments in the space industry in 2020.

【11】 Scaling Up Influence Functions 标题:放大影响函数 链接:https://arxiv.org/abs/2112.03052

作者:Andrea Schioppa,Polina Zablotskaia,David Vilar,Artem Sokolov 机构:Google Research 备注:Published at AAAI-22 摘要:我们解决了影响函数的有效计算问题,以便将预测跟踪回训练数据。我们提出并分析了一种基于Arnoldi迭代的加速逆Hessian计算的新方法。通过这一改进,据我们所知,我们首次成功实现了影响函数,可扩展到具有数亿个参数的全尺寸(语言和视觉)转换器模型。我们通过数千万到数亿个训练示例来评估我们在图像分类和序列到序列任务方面的方法。我们的代码将在https://github.com/google-research/jax-influence. 摘要:We address efficient calculation of influence functions for tracking predictions back to the training data. We propose and analyze a new approach to speeding up the inverse Hessian calculation based on Arnoldi iteration. With this improvement, we achieve, to the best of our knowledge, the first successful implementation of influence functions that scales to full-size (language and vision) Transformer models with several hundreds of millions of parameters. We evaluate our approach on image classification and sequence-to-sequence tasks with tens to a hundred of millions of training examples. Our code will be available at https://github.com/google-research/jax-influence.

【12】 D-Grasp: Physically Plausible Dynamic Grasp Synthesis for Hand-Object Interactions 标题:D-GRASP:物理上合理的手-物交互动态抓取综合 链接:https://arxiv.org/abs/2112.03028

作者:Sammy Christen,Muhammed Kocabas,Emre Aksan,Jemin Hwangbo,Jie Song,Otmar Hilliges 机构:Department of Computer Science, ETH Zurich, Max Planck Institute for Intelligent Systems, T¨ubingen, Department of Mechanical Engineering, KAIST, Can - grasp ,- seq , GT , Mustard - grasp , - sequence , Object , Sequence 摘要:我们介绍动态抓取合成任务:给定具有已知6D姿势和抓取参考的对象,我们的目标是生成将对象移动到目标6D姿势的运动。这是一个挑战,因为它需要对人类手的复杂发音以及与物体的复杂物理交互进行推理。我们提出了一种新的方法,在强化学习框架中构建这个问题,并利用物理模拟来学习和评估这种动态交互。分层方法将任务分解为低级抓取和高级运动合成。它可以用来生成新的手序列,这些手序列接近、抓取并将对象移动到所需的位置,同时保持人类的相似性。我们表明,我们的方法导致稳定的抓取,并产生广泛的运动。此外,即使是不完美的标签也可以通过我们的方法进行纠正,以生成动态交互序列。视频可在https://eth-ait.github.io/d-grasp/ . 摘要:We introduce the dynamic grasp synthesis task: given an object with a known 6D pose and a grasp reference, our goal is to generate motions that move the object to a target 6D pose. This is challenging, because it requires reasoning about the complex articulation of the human hand and the intricate physical interaction with the object. We propose a novel method that frames this problem in the reinforcement learning framework and leverages a physics simulation, both to learn and to evaluate such dynamic interactions. A hierarchical approach decomposes the task into low-level grasping and high-level motion synthesis. It can be used to generate novel hand sequences that approach, grasp, and move an object to a desired location, while retaining human-likeness. We show that our approach leads to stable grasps and generates a wide range of motions. Furthermore, even imperfect labels can be corrected by our method to generate dynamic interaction sequences. Video is available at https://eth-ait.github.io/d-grasp/ .

【13】 Nonstochastic Bandits with Composite Anonymous Feedback 标题:具有复合匿名反馈的非随机带 链接:https://arxiv.org/abs/2112.02866

作者:Nicolò Cesa-Bianchi,Tommaso Cesari,Roberto Colomboni,Claudio Gentile,Yishay Mansour 机构:Institut de Math´ematiques de Toulouse (IMT), Paul Sabatier University (UT,), Toulouse, France, Toulouse School of Economics (TSE), Toulouse, France, Istituto Italiano di Tecnologia (IIT), Genova, Italy, Universita degli Studi di Milano, Italy 摘要:我们调查了一个非暴力的强盗场景,在这个场景中,一个动作的损失不会立即由玩家承担,而是以一种对抗的方式分散在随后的回合中。然后,玩家在每轮结束时观察到的瞬时损失是之前玩过的动作的多个损失分量的总和。作为特例,此设置包含延迟反馈的强盗更容易完成的任务,这是一个经过充分研究的框架,玩家可以在其中单独观察延迟损失。我们的第一个贡献是将一个标准的bandit算法转换成一个可以在更困难的环境下运行的算法:我们根据原始算法的稳定性和遗憾来限制转换算法的遗憾。然后,我们证明了具有Tsallis熵的适当调谐FTRL的变换具有$sqrt{(d+1)KT}$阶的遗憾,其中$d$是最大延迟,$K$是臂数,$T$是时间范围。最后,我们表明,我们的结果通常不能通过在该设置下运行的任何算法的遗憾上显示匹配(高达对数因子)下限来改进。 摘要:We investigate a nonstochastic bandit setting in which the loss of an action is not immediately charged to the player, but rather spread over the subsequent rounds in an adversarial way. The instantaneous loss observed by the player at the end of each round is then a sum of many loss components of previously played actions. This setting encompasses as a special case the easier task of bandits with delayed feedback, a well-studied framework where the player observes the delayed losses individually. Our first contribution is a general reduction transforming a standard bandit algorithm into one that can operate in the harder setting: We bound the regret of the transformed algorithm in terms of the stability and regret of the original algorithm. Then, we show that the transformation of a suitably tuned FTRL with Tsallis entropy has a regret of order $sqrt{(d+1)KT}$, where $d$ is the maximum delay, $K$ is the number of arms, and $T$ is the time horizon. Finally, we show that our results cannot be improved in general by exhibiting a matching (up to a log factor) lower bound on the regret of any algorithm operating in this setting.

【14】 Target Entropy Annealing for Discrete Soft Actor-Critic 标题:离散软角色-批评者的目标熵退火算法 链接:https://arxiv.org/abs/2112.02852

作者:Yaosheng Xu,Dailin Hu,Litian Liang,Stephen McAleer,Pieter Abbeel,Roy Fox 机构:Department of Computer Science, University of California, Irvine, Department of Electrical Engineering and Computer Science, University of California, Berkeley 备注:None 摘要:Soft Actor-Critic (SAC) is considered the state-of-the-art algorithm in continuous action space settings. It uses the maximum entropy framework for efficiency and stability, and applies a heuristic temperature Lagrange term to tune the temperature $alpha$, which determines how "soft" the policy should be. It is counter-intuitive that empirical evidence shows SAC does not perform well in discrete domains. In this paper we investigate the possible explanations for this phenomenon and propose Target Entropy Scheduled SAC (TES-SAC), an annealing method for the target entropy parameter applied on SAC. Target entropy is a constant in the temperature Lagrange term and represents the target policy entropy in discrete SAC. We compare our method on Atari 2600 games with different constant target entropy SAC, and analyze on how our scheduling affects SAC.

【15】 MDPGT: Momentum-based Decentralized Policy Gradient Tracking 标题:MDPGT:基于动量的分散策略梯度跟踪 链接:https://arxiv.org/abs/2112.02813

作者:Zhanhong Jiang,Xian Yeow Lee,Sin Yong Tan,Kai Liang Tan,Aditya Balu,Young M. Lee,Chinmay Hegde,Soumik Sarkar 机构:Johnson Controls Inc., East Michigan St, Milwaukee, WI , Iowa State University, Ames, IA , New York University, MetroTech Center, Brooklyn, NY 摘要:We propose a novel policy gradient method for multi-agent reinforcement learning, which leverages two different variance-reduction techniques and does not require large batches over iterations. Specifically, we propose a momentum-based decentralized policy gradient tracking (MDPGT) where a new momentum-based variance reduction technique is used to approximate the local policy gradient surrogate with importance sampling, and an intermediate parameter is adopted to track two consecutive policy gradient surrogates. Moreover, MDPGT provably achieves the best available sample complexity of $mathcal{O}(N^{-1}epsilon^{-3})$ for converging to an $epsilon$-stationary point of the global average of $N$ local performance functions (possibly nonconcave). This outperforms the state-of-the-art sample complexity in decentralized model-free reinforcement learning, and when initialized with a single trajectory, the sample complexity matches those obtained by the existing decentralized policy gradient methods. We further validate the theoretical claim for the Gaussian policy function. When the required error tolerance $epsilon$ is small enough, MDPGT leads to a linear speed up, which has been previously established in decentralized stochastic optimization, but not for reinforcement learning. Lastly, we provide empirical results on a multi-agent reinforcement learning benchmark environment to support our theoretical findings.

【16】 BCD Nets: Scalable Variational Approaches for Bayesian Causal Discovery 标题:BCD网:贝叶斯因果发现的可扩展变分方法 链接:https://arxiv.org/abs/2112.02761

作者:Chris Cundy,Aditya Grover,Stefano Ermon 机构:Department of Computer Science, Stanford University, Facebook AI Research, University of California, Los Angeles 备注:Neural Information Processing Systems 2021 摘要:A structural equation model (SEM) is an effective framework to reason over causal relationships represented via a directed acyclic graph (DAG). Recent advances have enabled effective maximum-likelihood point estimation of DAGs from observational data. However, a point estimate may not accurately capture the uncertainty in inferring the underlying graph in practical scenarios, wherein the true DAG is non-identifiable and/or the observed dataset is limited. We propose Bayesian Causal Discovery Nets (BCD Nets), a variational inference framework for estimating a distribution over DAGs characterizing a linear-Gaussian SEM. Developing a full Bayesian posterior over DAGs is challenging due to the the discrete and combinatorial nature of graphs. We analyse key design choices for scalable VI over DAGs, such as 1) the parametrization of DAGs via an expressive variational family, 2) a continuous relaxation that enables low-variance stochastic optimization, and 3) suitable priors over the latent variables. We provide a series of experiments on real and synthetic data showing that BCD Nets outperform maximum-likelihood methods on standard causal discovery metrics such as structural Hamming distance in low data regimes.

【17】 Team Hitachi @ AutoMin 2021: Reference-free Automatic Minuting Pipeline with Argument Structure Construction over Topic-based Summarization 标题:团队Hitachi@AutoMin 2021:在基于主题的摘要上构建具有论元结构的无引用自动会议记录流水线 链接:https://arxiv.org/abs/2112.02741

作者:Atsuki Yamaguchi,Gaku Morio,Hiroaki Ozaki,Ken-ichi Yokote,Kenji Nagamatsu 机构:Research and Development Group, Hitachi, Ltd., Kokubunji, Tokyo, Japan 备注:8 pages, 4 figures 摘要:This paper introduces the proposed automatic minuting system of the Hitachi team for the First Shared Task on Automatic Minuting (AutoMin-2021). We utilize a reference-free approach (i.e., without using training minutes) for automatic minuting (Task A), which first splits a transcript into blocks on the basis of topics and subsequently summarizes those blocks with a pre-trained BART model fine-tuned on a summarization corpus of chat dialogue. In addition, we apply a technique of argument mining to the generated minutes, reorganizing them in a well-structured and coherent way. We utilize multiple relevance scores to determine whether or not a minute is derived from the same meeting when either a transcript or another minute is given (Task B and C). On top of those scores, we train a conventional machine learning model to bind them and to make final decisions. Consequently, our approach for Task A achieve the best adequacy score among all submissions and close performance to the best system in terms of grammatical correctness and fluency. For Task B and C, the proposed model successfully outperformed a majority vote baseline.

【18】 NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation 标题:NL-Augmenter:一种任务敏感型自然语言增强框架 链接:https://arxiv.org/abs/2112.02721

作者:Kaustubh D. Dhole,Varun Gangal,Sebastian Gehrmann,Aadesh Gupta,Zhenhao Li,Saad Mahamood,Abinaya Mahendiran,Simon Mille,Ashish Srivastava,Samson Tan,Tongshuang Wu,Jascha Sohl-Dickstein,Jinho D. Choi,Eduard Hovy,Ondrej Dusek,Sebastian Ruder,Sajant Anand,Nagender Aneja,Rabin Banjade,Lisa Barthe,Hanna Behnke,Ian Berlot-Attwell,Connor Boyle,Caroline Brun,Marco Antonio Sobrevilla Cabezudo,Samuel Cahyawijaya,Emile Chapuis,Wanxiang Che,Mukund Choudhary,Christian Clauss,Pierre Colombo,Filip Cornell,Gautier Dagan,Mayukh Das,Tanay Dixit,Thomas Dopierre,Paul-Alexis Dray,Suchitra Dubey,Tatiana Ekeinhor,Marco Di Giovanni,Rishabh Gupta,Rishabh Gupta,Louanes Hamla,Sang Han,Fabrice Harel-Canada,Antoine Honore,Ishan Jindal,Przemyslaw K. Joniak,Denis Kleyko,Venelin Kovatchev,Kalpesh Krishna,Ashutosh Kumar,Stefan Langer,Seungjae Ryan Lee,Corey James Levinson,Hualou Liang,Kaizhao Liang,Zhexiong Liu,Andrey Lukyanenko,Vukosi Marivate,Gerard de Melo,Simon Meoni,Maxime Meyer,Afnan Mir,Nafise Sadat Moosavi,Niklas Muennighoff,Timothy Sum Hon Mun,Kenton Murray,Marcin Namysl,Maria Obedkova,Priti Oli,Nivranshu Pasricha,Jan Pfister,Richard Plant,Vinay Prabhu,Vasile Pais,Libo Qin,Shahab Raji,Pawan Kumar Rajpoot,Vikas Raunak,Roy Rinberg,Nicolas Roberts,Juan Diego Rodriguez,Claude Roux,Vasconcellos P. H. S.,Ananya B. Sai,Robin M. Schmidt,Thomas Scialom,Tshephisho Sefara,Saqib N. Shamsi,Xudong Shen,Haoyue Shi,Yiwen Shi,Anna Shvets,Nick Siegel,Damien Sileo,Jamie Simon,Chandan Singh,Roman Sitelew,Priyank Soni,Taylor Sorensen,William Soto,Aman Srivastava,KV Aditya Srivatsa,Tony Sun,Mukund Varma T,A Tabassum,Fiona Anting Tan,Ryan Teehan,Mo Tiwari,Marie Tolkiehn,Athena Wang,Zijian Wang,Gloria Wang,Zijie J. Wang,Fuxuan Wei,Bryan Wilie,Genta Indra Winata,Xinyi Wu,Witold Wydmański,Tianbao Xie,Usama Yaseen,M. Yee,Jing Zhang,Yue Zhang 机构:ACKO,Agara Labs,Amelia R&D, New York,Applied Research Laboratories, The University of Texas at Austin, Bloomberg,Brigham Young University,Carnegie Mellon University,Center for Data and Computing in Natural Sciences 备注:39 pages, repository at this https URL 摘要:Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on. In this paper, we present NL-Augmenter, a new participatory Python-based natural language augmentation framework which supports the creation of both transformations (modifications to the data) and filters (data splits according to specific features). We describe the framework and an initial set of 117 transformations and 23 filters for a variety of natural language tasks. We demonstrate the efficacy of NL-Augmenter by using several of its transformations to analyze the robustness of popular natural language models. The infrastructure, datacards and robustness analysis results are available publicly on the NL-Augmenter repository (url{https://github.com/GEM-benchmark/NL-Augmenter}).

【19】 BERTMap: A BERT-based Ontology Alignment System 标题:BERTMap:一种基于ERT的本体对齐系统 链接:https://arxiv.org/abs/2112.02682

作者:Yuan He,Jiaoyan Chen,Denvar Antonyrajah,Ian Horrocks 机构: Department of Computer Science, University of Oxford, UK, Samsung Research, UK 备注:Full version (with appendix) of the accepted paper in 36th AAAI Conference on Artificial Intelligence 2022 摘要:Ontology alignment (a.k.a ontology matching (OM)) plays a critical role in knowledge integration. Owing to the success of machine learning in many domains, it has been applied in OM. However, the existing methods, which often adopt ad-hoc feature engineering or non-contextual word embeddings, have not yet outperformed rule-based systems especially in an unsupervised setting. In this paper, we propose a novel OM system named BERTMap which can support both unsupervised and semi-supervised settings. It first predicts mappings using a classifier based on fine-tuning the contextual embedding model BERT on text semantics corpora extracted from ontologies, and then refines the mappings through extension and repair by utilizing the ontology structure and logic. Our evaluation with three alignment tasks on biomedical ontologies demonstrates that BERTMap can often perform better than the leading OM systems LogMap and AML.

【20】 Using Static and Dynamic Malware features to perform Malware Ascription 标题:使用静电和动态恶意软件功能执行恶意软件归属 链接:https://arxiv.org/abs/2112.02639

作者:Jashanpreet Singh Sraw,Keshav Kumar 机构:a Thapar Institute of Engineering and Technology, Patiala, Punjab, India, b Savitribai Phule Pune University, Pune, Maharashtra, India 摘要:Malware ascription is a relatively unexplored area, and it is rather difficult to attribute malware and detect authorship. In this paper, we employ various Static and Dynamic features of malicious executables to classify malware based on their family. We leverage Cuckoo Sandbox and machine learning to make progress in this research. Post analysis, classification is performed using various deep learning and machine learning algorithms. Using the features gathered from VirusTotal (static) and Cuckoo (dynamic) reports, we ran the vectorized data against Multinomial Naive Bayes, Support Vector Machine, and Bagging using Decision Trees as the base estimator. For each classifier, we tuned the hyper-parameters using exhaustive search methods. Our reports can be extremely useful in malware ascription.

【21】 Smart IoT-Biofloc water management system using Decision regression tree 标题:基于决策回归树的智能物联网-Biofloc水管理系统 链接:https://arxiv.org/abs/2112.02577

作者:Samsil Arefin Mozumder,A S M Sharifuzzaman Sagar 机构: East Delta University, Chattogram; Bangladesh., Sejong University, Seoul, South Korea 备注:This article is accepted in the International Conference on 4th Industrial Revolution and Beyond (IC4IR) 2021 for proceeding 摘要:The conventional fishing industry has several difficulties: water contamination, temperature instability, nutrition, area, expense, etc. In fish farming, Biofloc technology turns traditional farming into a sophisticated infrastructure that enables the utilization of leftover food by turning it into bacterial biomass. The purpose of our study is to propose an intelligent IoT Biofloc system that improves efficiency and production. This article introduced a system that gathers data from sensors, store data in the cloud, analyses it using a machine learning model such as a Decision regression tree model to predict the water condition, and provides real-time monitoring through an android app. The proposed system has achieved a satisfactory accuracy of 79% during the experiment.

【22】 A Novel Approach to Solving Goal-Achieving Problems for Board Games 标题:一种解决棋类游戏目标实现问题的新方法 链接:https://arxiv.org/abs/2112.02563

作者:Chung-Chin Shih,Ti-Rong Wu,Ting Han Wei,I-Chen Wu 机构:National Yang Ming Chiao Tung University, Hsinchu, Taiwan, Research Center for Information Technology Innovation, Academia Sinica, Taiwan, Department of Computing Science, University of Alberta, Edmonton, Canada 备注:Accepted by AAAI2022. In this version, supplementary materials are added 摘要:Goal-achieving problems are puzzles that set up a specific situation with a clear objective. An example that is well-studied is the category of life-and-death (L&D) problems for Go, which helps players hone their skill of identifying region safety. Many previous methods like lambda search try null moves first, then derive so-called relevance zones (RZs), outside of which the opponent does not need to search. This paper first proposes a novel RZ-based approach, called the RZ-Based Search (RZS), to solving L&D problems for Go. RZS tries moves before determining whether they are null moves post-hoc. This means we do not need to rely on null move heuristics, resulting in a more elegant algorithm, so that it can also be seamlessly incorporated into AlphaZero's super-human level play in our solver. To repurpose AlphaZero for solving, we also propose a new training method called Faster to Life (FTL), which modifies AlphaZero to entice it to win more quickly. We use RZS and FTL to solve L&D problems on Go, namely solving 68 among 106 problems from a professional L&D book while a previous program solves 11 only. Finally, we discuss that the approach is generic in the sense that RZS is applicable to solving many other goal-achieving problems for board games.

【23】 Inf-CP: A Reliable Channel Pruning based on Channel Influence 标题:Inf-CP:一种基于信道影响的可靠信道剪枝 链接:https://arxiv.org/abs/2112.02521

作者:Bilan Lai,Haoran Xiang,Furao Shen 机构:School of Artificial Intelligence, of Nanjing University 摘要:One of the most effective methods of channel pruning is to trim on the basis of the importance of each neuron. However, measuring the importance of each neuron is an NP-hard problem. Previous works have proposed to trim by considering the statistics of a single layer or a plurality of successive layers of neurons. These works cannot eliminate the influence of different data on the model in the reconstruction error, and currently, there is no work to prove that the absolute values of the parameters can be directly used as the basis for judging the importance of the weights. A more reasonable approach is to eliminate the difference between batch data that accurately measures the weight of influence. In this paper, we propose to use ensemble learning to train a model for different batches of data and use the influence function (a classic technique from robust statistics) to learn the algorithm to track the model's prediction and return its training parameter gradient, so that we can determine the responsibility for each parameter, which we call "influence", in the prediction process. In addition, we theoretically prove that the back-propagation of the deep network is a first-order Taylor approximation of the influence function of the weights. We perform extensive experiments to prove that pruning based on the influence function using the idea of ensemble learning will be much more effective than just focusing on error reconstruction. Experiments on CIFAR shows that the influence pruning achieves the state-of-the-art result.

【24】 Neural Photometry-guided Visual Attribute Transfer 标题:神经光度学引导的视觉属性传递 链接:https://arxiv.org/abs/2112.02520

作者:Carlos Rodriguez-Pardo,Elena Garces 机构: Spain) and withUniversidad Carlos III de Madrid ( 2800 5, Spain) and with UniversidadRey Juan Carlos ( 289 3 3 备注:13 pages. To be published in Transactions on Visualizations and Computer Graphics. Project website: this http URL 摘要:We present a deep learning-based method for propagating spatially-varying visual material attributes (e.g. texture maps or image stylizations) to larger samples of the same or similar materials. For training, we leverage images of the material taken under multiple illuminations and a dedicated data augmentation policy, making the transfer robust to novel illumination conditions and affine deformations. Our model relies on a supervised image-to-image translation framework and is agnostic to the transferred domain; we showcase a semantic segmentation, a normal map, and a stylization. Following an image analogies approach, the method only requires the training data to contain the same visual structures as the input guidance. Our approach works at interactive rates, making it suitable for material edit applications. We thoroughly evaluate our learning methodology in a controlled setup providing quantitative measures of performance. Last, we demonstrate that training the model on a single material is enough to generalize to materials of the same type without the need for massive datasets.

【25】 A Novel Sequential Coreset Method for Gradient Descent Algorithms 标题:一种新的梯度下降算法的顺序共复位方法 链接:https://arxiv.org/abs/2112.02504

作者:Jiawei Huang,Ruomin Huang,Wenjie Liu,Nikolaos M. Freris,Hu Ding 机构: School of Computer Science and Technology, University of, Science and Technology of China, Anhui, China., School of Data Science, University of Science and Technology of 摘要:A wide range of optimization problems arising in machine learning can be solved by gradient descent algorithms, and a central question in this area is how to efficiently compress a large-scale dataset so as to reduce the computational complexity. {em Coreset} is a popular data compression technique that has been extensively studied before. However, most of existing coreset methods are problem-dependent and cannot be used as a general tool for a broader range of applications. A key obstacle is that they often rely on the pseudo-dimension and total sensitivity bound that can be very high or hard to obtain. In this paper, based on the ''locality'' property of gradient descent algorithms, we propose a new framework, termed ''sequential coreset'', which effectively avoids these obstacles. Moreover, our method is particularly suitable for sparse optimization whence the coreset size can be further reduced to be only poly-logarithmically dependent on the dimension. In practice, the experimental results suggest that our method can save a large amount of running time compared with the baseline algorithms.

【26】 Radial Basis Function Approximation with Distributively Stored Data on Spheres 标题:球面上分布存储数据的径向基函数逼近 链接:https://arxiv.org/abs/2112.02499

作者:Han Feng,Shao-Bo Lin,Ding-Xuan Zhou 机构:Received: date Accepted: date 备注:19 pages 摘要:This paper proposes a distributed weighted regularized least squares algorithm (DWRLS) based on spherical radial basis functions and spherical quadrature rules to tackle spherical data that are stored across numerous local servers and cannot be shared with each other. Via developing a novel integral operator approach, we succeed in deriving optimal approximation rates for DWRLS and theoretically demonstrate that DWRLS performs similarly as running a weighted regularized least squares algorithm with the whole data on a large enough machine. This interesting finding implies that distributed learning is capable of sufficiently exploiting potential values of distributively stored spherical data, even though every local server cannot access all the data.

【27】 Exploring Complicated Search Spaces with Interleaving-Free Sampling 标题:用非交错抽样探索复杂搜索空间 链接:https://arxiv.org/abs/2112.02488

作者:Yunjie Tian,Lingxi Xie,Jiemin Fang,Jianbin Jiao,Qixiang Ye,Qi Tian 机构:University of Chinese Academy of Sciences, Huawei Inc., Huazhong University of Science and Technology 备注:9 pages, 8 figures, 6 tables 摘要:The existing neural architecture search algorithms are mostly working on search spaces with short-distance connections. We argue that such designs, though safe and stable, obstacles the search algorithms from exploring more complicated scenarios. In this paper, we build the search algorithm upon a complicated search space with long-distance connections, and show that existing weight-sharing search algorithms mostly fail due to the existence of extbf{interleaved connections}. Based on the observation, we present a simple yet effective algorithm named extbf{IF-NAS}, where we perform a periodic sampling strategy to construct different sub-networks during the search procedure, avoiding the interleaved connections to emerge in any of them. In the proposed search space, IF-NAS outperform both random sampling and previous weight-sharing search algorithms by a significant margin. IF-NAS also generalizes to the micro cell-based spaces which are much easier. Our research emphasizes the importance of macro structure and we look forward to further efforts along this direction.

【28】 Variational Wasserstein gradient flow 标题:变分Wasserstein梯度流 链接:https://arxiv.org/abs/2112.02424

作者:Jiaojiao Fan,Amirhossein Taghvaei,Yongxin Chen 机构:Georgia Institute of Technology, University of Washington, Seattle 摘要:The gradient flow of a function over the space of probability densities with respect to the Wasserstein metric often exhibits nice properties and has been utilized in several machine learning applications. The standard approach to compute the Wasserstein gradient flow is the finite difference which discretizes the underlying space over a grid, and is not scalable. In this work, we propose a scalable proximal gradient type algorithm for Wasserstein gradient flow. The key of our method is a variational formulation of the objective function, which makes it possible to realize the JKO proximal map through a primal-dual optimization. This primal-dual problem can be efficiently solved by alternatively updating the parameters in the inner and outer loops. Our framework covers all the classical Wasserstein gradient flows including the heat equation and the porous medium equation. We demonstrate the performance and scalability of our algorithm with several numerical examples.

【29】 Efficient Pressure: Improving efficiency for signalized intersections 标题:有效压力:提高信号交叉口的效率 链接:https://arxiv.org/abs/2112.02336

作者:Qiang Wu,Liang Zhang,Jun Shen,Linyuan Lü,Bo Du,Jianqing Wu 机构:Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of, China, Chengdu , China, School of Life Sciences, Lanzhou University, Lanzhou , China 备注:7pages, 3figures 摘要:Since conventional approaches could not adapt to dynamic traffic conditions, reinforcement learning (RL) has attracted more attention to help solve the traffic signal control (TSC) problem. However, existing RL-based methods are rarely deployed considering that they are neither cost-effective in terms of computing resources nor more robust than traditional approaches, which raises a critical research question: how to construct an adaptive controller for TSC with less training and reduced complexity based on RL-based approach? To address this question, in this paper, we (1) innovatively specify the traffic movement representation as a simple but efficient pressure of vehicle queues in a traffic network, namely efficient pressure (EP); (2) build a traffic signal settings protocol, including phase duration, signal phase number and EP for TSC; (3) design a TSC approach based on the traditional max pressure (MP) approach, namely efficient max pressure (Efficient-MP) using the EP to capture the traffic state; and (4) develop a general RL-based TSC algorithm template: efficient Xlight (Efficient-XLight) under EP. Through comprehensive experiments on multiple real-world datasets in our traffic signal settings' protocol for TSC, we demonstrate that efficient pressure is complementary to traditional and RL-based modeling to design better TSC methods. Our code is released on Github.

【30】 ALX: Large Scale Matrix Factorization on TPUs 标题:ALX:TPU上的大规模矩阵分解 链接:https://arxiv.org/abs/2112.02194

作者:Harsh Mehta,Steffen Rendle,Walid Krichene,Li Zhang 机构:Google Research 摘要:We present ALX, an open-source library for distributed matrix factorization using Alternating Least Squares, written in JAX. Our design allows for efficient use of the TPU architecture and scales well to matrix factorization problems of O(B) rows/columns by scaling the number of available TPU cores. In order to spur future research on large scale matrix factorization methods and to illustrate the scalability properties of our own implementation, we also built a real world web link prediction dataset called WebGraph. This dataset can be easily modeled as a matrix factorization problem. We created several variants of this dataset based on locality and sparsity properties of sub-graphs. The largest variant of WebGraph has around 365M nodes and training a single epoch finishes in about 20 minutes with 256 TPU cores. We include speed and performance numbers of ALX on all variants of WebGraph. Both the framework code and the dataset is open-sourced.

【31】 Neural Pseudo-Label Optimism for the Bank Loan Problem 标题:银行贷款问题的神经伪标签乐观算法 链接:https://arxiv.org/abs/2112.02185

作者:Aldo Pacchiano,Shaun Singh,Edward Chou,Alexander C. Berg,Jakob Foerster 机构:Microsoft Research, FAIR 备注:10 pages main, 14 pages appendix 摘要:We study a class of classification problems best exemplified by the emph{bank loan} problem, where a lender decides whether or not to issue a loan. The lender only observes whether a customer will repay a loan if the loan is issued to begin with, and thus modeled decisions affect what data is available to the lender for future decisions. As a result, it is possible for the lender's algorithm to ``get stuck'' with a self-fulfilling model. This model never corrects its false negatives, since it never sees the true label for rejected data, thus accumulating infinite regret. In the case of linear models, this issue can be addressed by adding optimism directly into the model predictions. However, there are few methods that extend to the function approximation case using Deep Neural Networks. We present Pseudo-Label Optimism (PLOT), a conceptually and computationally simple method for this setting applicable to DNNs. PLOT{} adds an optimistic label to the subset of decision points the current model is deciding on, trains the model on all data so far (including these points along with their optimistic labels), and finally uses the resulting emph{optimistic} model for decision making. PLOT{} achieves competitive performance on a set of three challenging benchmark problems, requiring minimal hyperparameter tuning. We also show that PLOT{} satisfies a logarithmic regret guarantee, under a Lipschitz and logistic mean label model, and under a separability condition on the data.

【32】 Counterfactual Fairness in Mortgage Lending via Matching and Randomization 标题:从配对和随机化看抵押贷款中的反事实公正性 链接:https://arxiv.org/abs/2112.02170

作者:Sama Ghoba,Nathan Colaner 机构:Albers School of Business and Economics, Seattle University 备注:In NeurIPS 2021 Workshop on Algorithmic Fairness through the lens of Causality and Robustness 摘要:Unfairness in mortgage lending has created generational inequality among racial and ethnic groups in the US. Many studies address this problem, but most existing work focuses on correlation-based techniques. In our work, we use the framework of counterfactual fairness to train fair machine learning models. We propose a new causal graph for the variables available in the Home Mortgage Disclosure Act (HMDA) data. We use a matching-based approach instead of the latent variable modeling approach, because the former approach does not rely on any modeling assumptions. Furthermore, matching provides us with counterfactual pairs in which the race variable is isolated. We first demonstrate the unfairness in mortgage approval and interest rates between African-American and non-Hispanic White sub-populations. Then, we show that having balanced data using matching does not guarantee perfect counterfactual fairness of the machine learning models.

【33】 On Submodular Contextual Bandits 标题:关于子模上下文环 链接:https://arxiv.org/abs/2112.02165

作者:Dean P. Foster,Alexander Rakhlin 摘要:We consider the problem of contextual bandits where actions are subsets of a ground set and mean rewards are modeled by an unknown monotone submodular function that belongs to a class $mathcal{F}$. We allow time-varying matroid constraints to be placed on the feasible sets. Assuming access to an online regression oracle with regret $mathsf{Reg}(mathcal{F})$, our algorithm efficiently randomizes around local optima of estimated functions according to the Inverse Gap Weighting strategy. We show that cumulative regret of this procedure with time horizon $n$ scales as $O(sqrt{n mathsf{Reg}(mathcal{F})})$ against a benchmark with a multiplicative factor $1/2$. On the other hand, using the techniques of (Filmus and Ward 2014), we show that an $epsilon$-Greedy procedure with local randomization attains regret of $O(n^{2/3} mathsf{Reg}(mathcal{F})^{1/3})$ against a stronger $(1-e^{-1})$ benchmark.

【34】 ProbNum: Probabilistic Numerics in Python 标题:ProbNum:Python中的概率数值 链接:https://arxiv.org/abs/2112.02100

作者:Jonathan Wenger,Nicholas Krämer,Marvin Pförtner,Jonathan Schmidt,Nathanael Bosch,Nina Effenberger,Johannes Zenn,Alexandra Gessner,Toni Karvonen,François-Xavier Briol,Maren Mahsereci,Philipp Hennig 机构:University of Tübingen, Tübingen, Germany, University of Helsinki, Helsinki, Finland, University College London, London, UK 摘要:Probabilistic numerical methods (PNMs) solve numerical problems via probabilistic inference. They have been developed for linear algebra, optimization, integration and differential equation simulation. PNMs naturally incorporate prior information about a problem and quantify uncertainty due to finite computational resources as well as stochastic input. In this paper, we present ProbNum: a Python library providing state-of-the-art probabilistic numerical solvers. ProbNum enables custom composition of PNMs for specific problem classes via a modular design as well as wrappers for off-the-shelf use. Tutorials, documentation, developer guides and benchmarks are available online at www.probnum.org.

【35】 Bounding Wasserstein distance with couplings 标题:用联轴器限制Wasserstein距离 链接:https://arxiv.org/abs/2112.03152

作者:Niloy Biswas,Lester Mackey 机构:Harvard University, Microsoft Research New England 备注:53 pages, 10 figures 摘要:Markov chain Monte Carlo (MCMC) provides asymptotically consistent estimates of intractable posterior expectations as the number of iterations tends to infinity. However, in large data applications, MCMC can be computationally expensive per iteration. This has catalyzed interest in sampling methods such as approximate MCMC, which trade off asymptotic consistency for improved computational speed. In this article, we propose estimators based on couplings of Markov chains to assess the quality of such asymptotically biased sampling methods. The estimators give empirical upper bounds of the Wassertein distance between the limiting distribution of the asymptotically biased sampling method and the original target distribution of interest. We establish theoretical guarantees for our upper bounds and show that our estimators can remain effective in high dimensions. We apply our quality measures to stochastic gradient MCMC, variational Bayes, and Laplace approximations for tall data and to approximate MCMC for Bayesian logistic regression in 4500 dimensions and Bayesian linear regression in 50000 dimensions.

【36】 Tunable Image Quality Control of 3-D Ultrasound using Switchable CycleGAN 标题:基于可切换CycleGan的三维超声可调谐成像质量控制 链接:https://arxiv.org/abs/2112.02896

作者:Jaeyoung Huh,Shujaat Khan,Sungjin Choi,Dongkuk Shin,Eun Sun Lee,Jong Chul Ye 机构:Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon , Republic of Korea, System R&D Group, Samsung Medison Co., Ltd., Seoul, Korea 摘要:In contrast to 2-D ultrasound (US) for uniaxial plane imaging, a 3-D US imaging system can visualize a volume along three axial planes. This allows for a full view of the anatomy, which is useful for gynecological (GYN) and obstetrical (OB) applications. Unfortunately, the 3-D US has an inherent limitation in resolution compared to the 2-D US. In the case of 3-D US with a 3-D mechanical probe, for example, the image quality is comparable along the beam direction, but significant deterioration in image quality is often observed in the other two axial image planes. To address this, here we propose a novel unsupervised deep learning approach to improve 3-D US image quality. In particular, using {em unmatched} high-quality 2-D US images as a reference, we trained a recently proposed switchable CycleGAN architecture so that every mapping plane in 3-D US can learn the image quality of 2-D US images. Thanks to the switchable architecture, our network can also provide real-time control of image enhancement level based on user preference, which is ideal for a user-centric scanner setup. Extensive experiments with clinical evaluation confirm that our method offers significantly improved image quality as well user-friendly flexibility.

【37】 Generalized Likelihood Ratio Test for Adversarially Robust Hypothesis Testing 标题:逆稳健假设检验的广义似然比检验 链接:https://arxiv.org/abs/2112.02209

作者:Bhagyashree Puranik,Upamanyu Madhow,Ramtin Pedarsani 机构:Member, IEEE 备注:Submitted to the IEEE Transactions on Signal Processing 摘要:Machine learning models are known to be susceptible to adversarial attacks which can cause misclassification by introducing small but well designed perturbations. In this paper, we consider a classical hypothesis testing problem in order to develop fundamental insight into defending against such adversarial perturbations. We interpret an adversarial perturbation as a nuisance parameter, and propose a defense based on applying the generalized likelihood ratio test (GLRT) to the resulting composite hypothesis testing problem, jointly estimating the class of interest and the adversarial perturbation. While the GLRT approach is applicable to general multi-class hypothesis testing, we first evaluate it for binary hypothesis testing in white Gaussian noise under $ell_{infty}$ norm-bounded adversarial perturbations, for which a known minimax defense optimizing for the worst-case attack provides a benchmark. We derive the worst-case attack for the GLRT defense, and show that its asymptotic performance (as the dimension of the data increases) approaches that of the minimax defense. For non-asymptotic regimes, we show via simulations that the GLRT defense is competitive with the minimax approach under the worst-case attack, while yielding a better robustness-accuracy tradeoff under weaker attacks. We also illustrate the GLRT approach for a multi-class hypothesis testing problem, for which a minimax strategy is not known, evaluating its performance under both noise-agnostic and noise-aware adversarial settings, by providing a method to find optimal noise-aware attacks, and heuristics to find noise-agnostic attacks that are close to optimal in the high SNR regime.

【38】 Data Fusion with Latent Map Gaussian Processes 标题:基于潜映射高斯过程的数据融合 链接:https://arxiv.org/abs/2112.02206

作者:Nicholas Oune,Jonathan Tammer Eweis-Labolle,Ramin Bostanabad 机构:Mechanical and Aerospace Engineering, University of California, Irvine, Irvine, California, USA 摘要:Multi-fidelity modeling and calibration are data fusion tasks that ubiquitously arise in engineering design. In this paper, we introduce a novel approach based on latent-map Gaussian processes (LMGPs) that enables efficient and accurate data fusion. In our approach, we convert data fusion into a latent space learning problem where the relations among different data sources are automatically learned. This conversion endows our approach with attractive advantages such as increased accuracy, reduced costs, flexibility to jointly fuse any number of data sources, and ability to visualize correlations between data sources. This visualization allows the user to detect model form errors or determine the optimum strategy for high-fidelity emulation by fitting LMGP only to the subset of the data sources that are well-correlated. We also develop a new kernel function that enables LMGPs to not only build a probabilistic multi-fidelity surrogate but also estimate calibration parameters with high accuracy and consistency. The implementation and use of our approach are considerably simpler and less prone to numerical issues compared to existing technologies. We demonstrate the benefits of LMGP-based data fusion by comparing its performance against competing methods on a wide range of examples.

【39】 Echocardiography Segmentation with Enforced Temporal Consistency 标题:增强时间一致性的超声心动图分割 链接:https://arxiv.org/abs/2112.02102

作者:Nathan Painchaud,Nicolas Duchateau,Olivier Bernard,Pierre-Marc Jodoin 备注:10 pages, submitted to IEEE TMI 摘要:Convolutional neural networks (CNN) have demonstrated their ability to segment 2D cardiac ultrasound images. However, despite recent successes according to which the intra-observer variability on end-diastole and end-systole images has been reached, CNNs still struggle to leverage temporal information to provide accurate and temporally consistent segmentation maps across the whole cycle. Such consistency is required to accurately describe the cardiac function, a necessary step in diagnosing many cardiovascular diseases. In this paper, we propose a framework to learn the 2D+time long-axis cardiac shape such that the segmented sequences can benefit from temporal and anatomical consistency constraints. Our method is a post-processing that takes as input segmented echocardiographic sequences produced by any state-of-the-art method and processes it in two steps to (i) identify spatio-temporal inconsistencies according to the overall dynamics of the cardiac sequence and (ii) correct the inconsistencies. The identification and correction of cardiac inconsistencies relies on a constrained autoencoder trained to learn a physiologically interpretable embedding of cardiac shapes, where we can both detect and fix anomalies. We tested our framework on 98 full-cycle sequences from the CAMUS dataset, which will be rendered public alongside this paper. Our temporal regularization method not only improves the accuracy of the segmentation across the whole sequences, but also enforces temporal and anatomical consistency.

机器翻译,仅供参考