zl程序教程

您现在的位置是:首页 >  IT要闻

当前栏目

机器学习学术速递[12.8]

2023-04-18 15:00:48 时间

cs.LG 方向,今日共计119篇

Graph相关(图学习|图神经网络|图优化等)(12篇)

【1】 OOD-GNN: Out-of-Distribution Generalized Graph Neural Network 标题:OOD-GNN:失配广义图神经网络 链接:https://arxiv.org/abs/2112.03806

作者:Haoyang Li,Xin Wang,Ziwei Zhang,Wenwu Zhu 机构:Tsinghua University 备注:18 pages 摘要:当测试和训练来自相同分布的图形数据时,图形神经网络(GNNs)已经取得了令人印象深刻的性能。然而,现有的GNN缺乏分布外泛化能力,当测试和训练图形数据之间存在分布偏移时,其性能会显著下降。为了解决这个问题,在这项工作中,我们提出了一种分布外的广义图神经网络(OOD-GNN),用于在具有不同训练图分布的不可见测试图上获得令人满意的性能。我们提出的OOD-GNN采用了一种新的利用随机傅立叶特征的非线性图表示去相关方法,该方法鼓励模型通过迭代优化样本图权重和图编码器来消除相关和不相关图表示之间的统计依赖性。我们进一步设计了一个全局权重估计器来学习训练图的权重,使得图表示中的变量是独立的。学习的权重有助于图形编码器摆脱虚假的相关性,进而更加关注学习的区别性图形表示和它们的基本真值标签之间的真实联系。我们在两个合成数据集和12个具有分布移位的真实数据集上进行了大量实验,以验证分布外泛化能力。结果表明,我们提出的OOD-GNN显著优于最先进的基线。 摘要:Graph neural networks (GNNs) have achieved impressive performance when testing and training graph data come from identical distribution. However, existing GNNs lack out-of-distribution generalization abilities so that their performance substantially degrades when there exist distribution shifts between testing and training graph data. To solve this problem, in this work, we propose an out-of-distribution generalized graph neural network (OOD-GNN) for achieving satisfactory performance on unseen testing graphs that have different distributions with training graphs. Our proposed OOD-GNN employs a novel nonlinear graph representation decorrelation method utilizing random Fourier features, which encourages the model to eliminate the statistical dependence between relevant and irrelevant graph representations through iteratively optimizing the sample graph weights and graph encoder. We further design a global weight estimator to learn weights for training graphs such that variables in graph representations are forced to be independent. The learned weights help the graph encoder to get rid of spurious correlations and, in turn, concentrate more on the true connection between learned discriminative graph representations and their ground-truth labels. We conduct extensive experiments to validate the out-of-distribution generalization abilities on two synthetic and 12 real-world datasets with distribution shifts. The results demonstrate that our proposed OOD-GNN significantly outperforms state-of-the-art baselines.

【2】 CCasGNN: Collaborative Cascade Prediction Based on Graph Neural Networks 标题:CCasGNN:基于图神经网络的协同叶栅预测 链接:https://arxiv.org/abs/2112.03644

作者:Yansong Wang,Xiaomeng Wang,Tao Jia 机构:College of Computer and Information Science, Southwest University, Chongqing, China 摘要:级联预测的目的是对网络中的信息扩散进行建模。大多数以前的方法集中于从网络和传播路径中挖掘结构或序列特征。最近致力于通过图神经网络和递归神经网络结合网络结构和序列特征的工作。然而,光谱或空间方法的局限性限制了预测性能的提高。此外,递归神经网络耗时长、计算量大,导致预测效率低下。在这里,我们提出了一种新的方法CCasGNN考虑个人的轮廓,结构特征和序列信息。该方法利用了GAT和GCN的协作框架,将位置编码叠加到图神经网络层中,与现有的图神经网络层不同,表现出良好的性能。在两个真实数据集上进行的实验证实,与最先进的方法相比,我们的方法显著提高了预测精度。更重要的是,烧蚀研究调查了我们方法中每个成分的贡献。 摘要:Cascade prediction aims at modeling information diffusion in the network. Most previous methods concentrate on mining either structural or sequential features from the network and the propagation path. Recent efforts devoted to combining network structure and sequence features by graph neural networks and recurrent neural networks. Nevertheless, the limitation of spectral or spatial methods restricts the improvement of prediction performance. Moreover, recurrent neural networks are time-consuming and computation-expensive, which causes the inefficiency of prediction. Here, we propose a novel method CCasGNN considering the individual profile, structural features, and sequence information. The method benefits from using a collaborative framework of GAT and GCN and stacking positional encoding into the layers of graph neural networks, which is different from all existing ones and demonstrates good performance. The experiments conducted on two real-world datasets confirm that our method significantly improves the prediction accuracy compared to state-of-the-art approaches. What's more, the ablation study investigates the contribution of each component in our method.

【3】 Permutation Equivariant Generative Adversarial Networks for Graphs 标题:图的置换等变生成对抗网络 链接:https://arxiv.org/abs/2112.03621

作者:Yoann Boget,Magda Gregorova,Alexandros Kalousis 机构:University of Geneva, Geneva School of Business, Administration HES-SO, Carouge, Switzerland, Center for Artificial Intelligence, and Robotics (CAIRO), FHWS, Würzburg-Schweinfurt, Germany 备注:ELLIS Machine Learning for Molecule Discovery Workshop. 5 pages + ref. + appendix 摘要:图生成建模中讨论最多的问题之一是表示的顺序。一种解决方案是使用等变生成函数,以确保排序不变性。在讨论了这些函数的一些性质之后,我们提出了3G-GAN,这是一个依赖于GANs和等变函数的三阶段模型。该模型仍在开发中。然而,我们提出了一些令人鼓舞的探索性实验,并讨论了仍有待解决的问题。 摘要:One of the most discussed issues in graph generative modeling is the ordering of the representation. One solution consists of using equivariant generative functions, which ensure the ordering invariance. After having discussed some properties of such functions, we propose 3G-GAN, a 3-stages model relying on GANs and equivariant functions. The model is still under development. However, we present some encouraging exploratory experiments and discuss the issues still to be addressed.

【4】 Graph Neural Controlled Differential Equations for Traffic Forecasting 标题:用于交通量预测的图神经控制微分方程 链接:https://arxiv.org/abs/2112.03558

作者:Jeongwhan Choi,Hwangyong Choi,Jeehyun Hwang,Noseong Park 机构:Yonsei University, Seoul, South Korea 备注:Accepted by AAAI 2022 摘要:交通量预测是机器学习领域中最流行的时空任务之一。该领域的一种流行方法是将图卷积网络和递归神经网络结合起来进行时空处理。竞争激烈,许多新方法被提出。本文提出了时空图神经控制微分方程(STG-NCDE)的求解方法。神经控制微分方程(NCDE)是处理序列数据的一个突破性概念。我们扩展了概念并设计了两个NCDE:一个用于时间处理,另一个用于空间处理。然后,我们将它们合并到一个框架中。我们用6个基准数据集和20个基线进行了实验。STG-NCDE在所有情况下都显示出最佳的准确性,在非平凡的利润率方面超过了所有这20条基线。 摘要:Traffic forecasting is one of the most popular spatio-temporal tasks in the field of machine learning. A prevalent approach in the field is to combine graph convolutional networks and recurrent neural networks for the spatio-temporal processing. There has been fierce competition and many novel methods have been proposed. In this paper, we present the method of spatio-temporal graph neural controlled differential equation (STG-NCDE). Neural controlled differential equations (NCDEs) are a breakthrough concept for processing sequential data. We extend the concept and design two NCDEs: one for the temporal processing and the other for the spatial processing. After that, we combine them into a single framework. We conduct experiments with 6 benchmark datasets and 20 baselines. STG-NCDE shows the best accuracy in all cases, outperforming all those 20 baselines by non-trivial margins.

【5】 Self-Organized Polynomial-Time Coordination Graphs 标题:自组织多项式时间坐标图 链接:https://arxiv.org/abs/2112.03547

作者:Qianlan Yang,Weijun Dong,Zhizhou Ren,Jianhao Wang,Tonghan Wang,Chongjie Zhang 机构:Institute for Interdisciplinary Information Sciences, Tsinghua University, Department of Computer Science, University of Illinois at Urbana-Champaign 摘要:协调图是多agent强化学习中一种很有前途的agent协作建模方法。它将一个大型多代理系统分解为一组重叠的组,这些组表示底层的协调依赖关系。该范式中的一个关键挑战是计算基于图的值分解的最大值操作的复杂性。它是指分散约束优化问题(DCOP),其常数比近似是NP难问题。为了绕过这一基本难题,本文提出了一种新的方法,称为自组织多项式时间协调图(SOP-CG),该方法使用结构化图类来保证诱导DCOP具有足够的函数表达能力。我们将图的拓扑结构扩展为状态相关的,将图的选择表述为一个假想的agent,最后从统一的Bellman最优性方程导出一个端到端的学习范式。在实验中,我们表明,我们的方法学习可解释图拓扑,诱导有效的协调,并在各种协作多代理任务中提高性能。 摘要:Coordination graph is a promising approach to model agent collaboration in multi-agent reinforcement learning. It factorizes a large multi-agent system into a suite of overlapping groups that represent the underlying coordination dependencies. One critical challenge in this paradigm is the complexity of computing maximum-value actions for a graph-based value factorization. It refers to the decentralized constraint optimization problem (DCOP), which and whose constant-ratio approximation are NP-hard problems. To bypass this fundamental hardness, this paper proposes a novel method, named Self-Organized Polynomial-time Coordination Graphs (SOP-CG), which uses structured graph classes to guarantee the optimality of the induced DCOPs with sufficient function expressiveness. We extend the graph topology to be state-dependent, formulate the graph selection as an imaginary agent, and finally derive an end-to-end learning paradigm from the unified Bellman optimality equation. In experiments, we show that our approach learns interpretable graph topologies, induces effective coordination, and improves performance across a variety of cooperative multi-agent tasks.

【6】 A Piece-wise Polynomial Filtering Approach for Graph Neural Networks 标题:图神经网络的分段多项式滤波方法 链接:https://arxiv.org/abs/2112.03499

作者:Vijay Lingam,Chanakya Ekbote,Manan Sharma,Rahul Ragesh,Arun Iyer,Sundararajan Sellamanickam 机构:Microsoft Research India 备注:28 pages, 9 figures, Under Review 摘要:图神经网络(GNNs)利用节点特征和输入图拓扑的信号来提高节点分类任务的性能。然而,这些模型在异亲图上的性能往往很差,其中连接的节点具有不同的标签。最近提出的GNN在具有不同同态水平的图上工作。其中,依赖多项式图滤波器的模型显示了良好的前景。我们观察到,这些多项式图滤波器模型的解也是超定方程组的解。它表明,在某些情况下,模型需要学习一个合理的高阶多项式。通过调查,我们发现所提出的模型由于其设计而不能有效地学习这些多项式。为了缓解这个问题,我们对图进行特征分解,并建议学习作用于谱的不同子集的多个自适应多项式滤波器。我们的理论和经验表明,我们提出的模型学习了更好的滤波器,从而提高了分类精度。我们研究了我们提出的模型的各个方面,包括对所使用的特征分量数量的依赖性、学习的潜在多项式滤波器以及单个多项式在节点分类任务中的性能。通过对大型图的评估,我们进一步证明了我们的模型是可伸缩的。与最先进的模型相比,我们的模型实现了高达5%的性能增益,总体上优于现有的基于多项式滤波器的方法。 摘要:Graph Neural Networks (GNNs) exploit signals from node features and the input graph topology to improve node classification task performance. However, these models tend to perform poorly on heterophilic graphs, where connected nodes have different labels. Recently proposed GNNs work across graphs having varying levels of homophily. Among these, models relying on polynomial graph filters have shown promise. We observe that solutions to these polynomial graph filter models are also solutions to an overdetermined system of equations. It suggests that in some instances, the model needs to learn a reasonably high order polynomial. On investigation, we find the proposed models ineffective at learning such polynomials due to their designs. To mitigate this issue, we perform an eigendecomposition of the graph and propose to learn multiple adaptive polynomial filters acting on different subsets of the spectrum. We theoretically and empirically show that our proposed model learns a better filter, thereby improving classification accuracy. We study various aspects of our proposed model including, dependency on the number of eigencomponents utilized, latent polynomial filters learned, and performance of the individual polynomials on the node classification task. We further show that our model is scalable by evaluating over large graphs. Our model achieves performance gains of up to 5% over the state-of-the-art models and outperforms existing polynomial filter-based approaches in general.

【7】 GraphPAS: Parallel Architecture Search for Graph Neural Networks 标题:GraphPAS:图神经网络的并行结构搜索 链接:https://arxiv.org/abs/2112.03461

作者:Jiamin Chen,Jianliang Gao,Yibo Chen,Oloulade Babatounde Moctard,Tengfei Lyu,Zhao Li 机构:School of Computer Science and, Engineering, Central South University, State Grid Hunan Electric Power, Company Limited, Alibaba Group 备注:5 papes,3 figures,Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval 摘要:近年来,随着图神经网络(GNNs)在非欧几里德数据上的成功应用,图神经结构搜索受到了广泛关注。然而,在巨大的搜索空间中探索所有可能的GNNs架构对于大型图形数据来说太耗时或不可能。在本文中,我们提出了一种用于图神经网络的并行图结构搜索(GraphPAS)框架。在GraphPAS中,我们通过设计一个基于共享的进化学习来并行探索搜索空间,它可以在不损失准确性的情况下提高搜索效率。另外,采用结构信息熵动态计算变异选择概率,减少空间探索。实验结果表明,GraphPAS在效率和准确性方面均优于现有模型。 摘要:Graph neural architecture search has received a lot of attention as Graph Neural Networks (GNNs) has been successfully applied on the non-Euclidean data recently. However, exploring all possible GNNs architectures in the huge search space is too time-consuming or impossible for big graph data. In this paper, we propose a parallel graph architecture search (GraphPAS) framework for graph neural networks. In GraphPAS, we explore the search space in parallel by designing a sharing-based evolution learning, which can improve the search efficiency without losing the accuracy. Additionally, architecture information entropy is adopted dynamically for mutation selection probability, which can reduce space exploration. The experimental result shows that GraphPAS outperforms state-of-art models with efficiency and accuracy simultaneously.

【8】 Feature Importance-aware Graph Attention Network and Dueling Double Deep Q-Network Combined Approach for Critical Node Detection Problems 标题:关键节点检测问题的特征重要度图关注网络和双深度Q网络相结合的方法 链接:https://arxiv.org/abs/2112.03404

作者:Xuwei Tan,Yangming Zhou,Zhang-Hua Fu,Mengchu Zhou 机构:Department of Computer Science and Engineering, East China University of Science and Technology, Sino-US Global Logistics Institute, Shanghai Jiao Tong University, Macau Institute of Systems Engineering, Macau University of Science and Technology 备注:10 pages, 3 figures 摘要:检测稀疏网络中的关键节点在许多应用领域都很重要。关键节点问题(CNP)的目的是从一个网络中找到一组关键节点,这些节点的删除会最大程度地降低剩余网络的成对连通性。由于其一般NP难性质,最先进的CNP解决方案基于启发式方法。在设计此类方法时,通常需要领域知识和反复试验,因此需要花费大量的精力和时间。本文提出了一种基于特征重要性的图注意网络用于节点表示,并将其与双深度Q网络相结合,首次提出了一种端到端的算法来求解CNP。它不需要任何特定于问题的知识或大多数现有方法所要求的标记数据集。一旦对模型进行了训练,就可以将其推广到处理各种类型的CNP(具有不同的大小和拓扑结构),而无需重新训练。在28个真实网络上进行的大量实验表明,该方法与最新的方法具有很高的可比性。它不需要任何特定于问题的知识,因此,通过使用现有的方法,它可以适用于许多应用,包括那些不可能的应用。它可以与一些局部搜索方法相结合,进一步提高其解的质量。大量的比较结果表明了该方法在解决CNP问题上的有效性。 摘要:Detecting critical nodes in sparse networks is important in a variety of application domains. A Critical Node Problem (CNP) aims to find a set of critical nodes from a network whose deletion maximally degrades the pairwise connectivity of the residual network. Due to its general NP-hard nature, state-of-the-art CNP solutions are based on heuristic approaches. Domain knowledge and trial-and-error are usually required when designing such approaches, thus consuming considerable effort and time. This work proposes a feature importance-aware graph attention network for node representation and combines it with dueling double deep Q-network to create an end-to-end algorithm to solve CNP for the first time. It does not need any problem-specific knowledge or labeled datasets as required by most of existing methods. Once the model is trained, it can be generalized to cope with various types of CNPs (with different sizes and topological structures) without re-training. Extensive experiments on 28 real-world networks show that the proposed method is highly comparable to state-of-the-art methods. It does not require any problem-specific knowledge and, hence, can be applicable to many applications including those impossible ones by using the existing approaches. It can be combined with some local search methods to further improve its solution quality. Extensive comparison results are given to show its effectiveness in solving CNP.

【9】 Graph Neural Networks Accelerated Molecular Dynamics 标题:图神经网络加速分子动力学 链接:https://arxiv.org/abs/2112.03383

作者:Zijie Li,Kazem Meidani,Prakarsh Yadav,Amir Barati Farimani 机构:†Department of Mechanical Engineering, Carnegie Mellon University, Pittsburgh PA, USA, ‡Machine Learning Department, Carnegie Mellon University, Pittsburgh PA, USA, ¶Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh PA, USA 备注:preprint; under review 摘要:分子动力学(MD)模拟是理解物质动力学和结构的有力工具。由于MD的分辨率是原子级的,因此用飞秒积分实现长时间尺度的模拟非常昂贵。在每个MD步骤中,执行大量冗余计算,这些计算可以学习和避免。这些冗余计算可以用深度学习模型(如图形神经网络(GNN))代替和建模。在这项工作中,我们开发了一个GNN加速分子动力学(GAMD)模型,该模型可实现快速准确的力预测,并生成与经典MD模拟一致的轨迹。我们的结果表明,GAMD可以准确地预测两个典型的分子系统,Lennard-Jones(LJ)粒子和水(LJ+静电)的动力学。GAMD的学习和推理与规模无关,它可以在测试时扩展到更大的系统。我们还进行了一次全面的基准测试,将我们的GAMD实现与生产级MD软件进行了比较,结果表明,GAMD在大规模模拟方面与它们具有竞争力。 摘要:Molecular Dynamics (MD) simulation is a powerful tool for understanding the dynamics and structure of matter. Since the resolution of MD is atomic-scale, achieving long time-scale simulations with femtosecond integration is very expensive. In each MD step, numerous redundant computations are performed which can be learnt and avoided. These redundant computations can be surrogated and modeled by a deep learning model like a Graph Neural Network (GNN). In this work, we developed a GNN Accelerated Molecular Dynamics (GAMD) model that achieves fast and accurate force predictions and generates trajectories consistent with the classical MD simulations. Our results show that GAMD can accurately predict the dynamics of two typical molecular systems, Lennard-Jones (LJ) particles and Water (LJ+Electrostatics). GAMD's learning and inference are agnostic to the scale, where it can scale to much larger systems at test time. We also performed a comprehensive benchmark test comparing our implementation of GAMD to production-level MD softwares, where we showed GAMD is competitive with them on the large-scale simulation.

【10】 Scalable Geometric Deep Learning on Molecular Graphs 标题:分子图上的可伸缩几何深度学习 链接:https://arxiv.org/abs/2112.03364

作者:Nathan C. Frey,Siddharth Samsi,Joseph McDonald,Lin Li,Connor W. Coley,Vijay Gadepally 机构:MIT 备注:7 pages, 3 figures, NeurIPS 2021 AI for Science workshop 摘要:由于应用科学、人工智能和高性能计算之间缺乏整合,分子和材料科学的深度学习受到限制。与训练数据量、模型架构的大小和复杂性以及计算基础设施的规模相关的瓶颈都是限制分子和材料深度学习规模的关键因素。在这里,我们展示了$ extit{LitMatter}$,一个用于扩展分子深度学习方法的轻量级框架。我们在400多个GPU上训练了四种图形神经网络结构,并研究了这些方法的伸缩行为。根据模型体系结构的不同,可以看到训练时间加速到$60倍$。经验神经标度关系量化了依赖于模型的标度,实现了最佳计算资源分配和可伸缩分子几何深度学习模型实现的识别。 摘要:Deep learning in molecular and materials sciences is limited by the lack of integration between applied science, artificial intelligence, and high-performance computing. Bottlenecks with respect to the amount of training data, the size and complexity of model architectures, and the scale of the compute infrastructure are all key factors limiting the scaling of deep learning for molecules and materials. Here, we present $ extit{LitMatter}$, a lightweight framework for scaling molecular deep learning methods. We train four graph neural network architectures on over 400 GPUs and investigate the scaling behavior of these methods. Depending on the model architecture, training time speedups up to $60 imes$ are seen. Empirical neural scaling relations quantify the model-dependent scaling and enable optimal compute resource allocation and the identification of scalable molecular geometric deep learning model implementations.

【11】 Dynamic Graph Learning-Neural Network for Multivariate Time Series Modeling 标题:多变量时间序列建模的动态图学习-神经网络 链接:https://arxiv.org/abs/2112.03273

作者:Zhuoling Li,Gaowei Zhang,Lingyu Xu,Jie Yu 机构:School of Computer Engineering and Science, Shanghai University, School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Shanghai Institute for Advanced Communication and Data Science, Shanghai University 摘要:多元时间序列预测是一项具有挑战性的任务,因为数据涉及长期和短期模式的混合,变量之间具有动态时空依赖性。现有的图神经网络(GNN)通常使用预定义的空间图或学习的固定邻接图来建模多变量关系。它限制了GNN的应用,无法应对上述挑战。在本文中,我们提出了一个新的框架,即静态和动态图学习神经网络(SDGL)。该模型从数据中获取静态和动态图形矩阵,分别对长期和短期模式进行建模。静态矩阵通过节点嵌入来捕获固定的长期关联模式,并利用图的正则性来控制学习静态图的质量。为了捕获变量之间的动态依赖关系,我们提出了一种基于变化节点特征和静态节点嵌入生成时变矩阵的动态图学习方法。在该方法中,我们将学习到的静态图信息整合为归纳偏差,从而更好地构建动态图和局部时空模式。在两个具有额外结构信息的交通数据集和四个时间序列数据集上进行了大量实验,结果表明,我们的方法在几乎所有数据集上都达到了最先进的性能。如果论文被接受,我将在github上打开源代码。 摘要:Multivariate time series forecasting is a challenging task because the data involves a mixture of long- and short-term patterns, with dynamic spatio-temporal dependencies among variables. Existing graph neural networks (GNN) typically model multivariate relationships with a pre-defined spatial graph or learned fixed adjacency graph. It limits the application of GNN and fails to handle the above challenges. In this paper, we propose a novel framework, namely static- and dynamic-graph learning-neural network (SDGL). The model acquires static and dynamic graph matrices from data to model long- and short-term patterns respectively. Static matric is developed to capture the fixed long-term association pattern via node embeddings, and we leverage graph regularity for controlling the quality of the learned static graph. To capture dynamic dependencies among variables, we propose dynamic graphs learning method to generate time-varying matrices based on changing node features and static node embeddings. And in the method, we integrate the learned static graph information as inductive bias to construct dynamic graphs and local spatio-temporal patterns better. Extensive experiments are conducted on two traffic datasets with extra structural information and four time series datasets, which show that our approach achieves state-of-the-art performance on almost all datasets. If the paper is accepted, I will open the source code on github.

【12】 Multi-scale Graph Convolutional Networks with Self-Attention 标题:具有自关注的多尺度图卷积网络 链接:https://arxiv.org/abs/2112.03262

作者:Zhilong Xiong,Jia Cai 机构:, School of Statistics and Mathematics, Guangdong University of Finance & Economics, Guangzhou, Guangdong, China. 摘要:图卷积网络(GCN)近年来在处理各种图结构数据方面取得了显著的学习能力。一般来说,深度GCN效果不佳,因为传统GCN中的图卷积是拉普拉斯平滑的一种特殊形式,使得不同节点的表示难以区分。在文献中,GCNs采用了多尺度信息来增强GCNs的表达能力。然而,过度平滑现象作为GCN的一个关键问题仍有待解决和研究。在本文中,我们提出了两个新的多尺度GCN框架,将自我注意机制和多尺度信息融入GCN的设计中。我们的方法大大提高了GCNs模型的计算效率和预测精度。在节点分类和图分类上的大量实验证明了在几个最先进的GCN上的有效性。值得注意的是,提出的两种体系结构可以有效地缓解GCN的过度平滑问题,并且我们的模型层甚至可以增加到$64$。 摘要:Graph convolutional networks (GCNs) have achieved remarkable learning ability for dealing with various graph structural data recently. In general, deep GCNs do not work well since graph convolution in conventional GCNs is a special form of Laplacian smoothing, which makes the representation of different nodes indistinguishable. In the literature, multi-scale information was employed in GCNs to enhance the expressive power of GCNs. However, over-smoothing phenomenon as a crucial issue of GCNs remains to be solved and investigated. In this paper, we propose two novel multi-scale GCN frameworks by incorporating self-attention mechanism and multi-scale information into the design of GCNs. Our methods greatly improve the computational efficiency and prediction accuracy of the GCNs model. Extensive experiments on both node classification and graph classification demonstrate the effectiveness over several state-of-the-art GCNs. Notably, the proposed two architectures can efficiently mitigate the over-smoothing problem of GCNs, and the layer of our model can even be increased to $64$.

Transformer(1篇)

【1】 raceBERT -- A Transformer-based Model for Predicting Race from Names 标题:RaceBERT--一种基于Transformer的从名字预测种族的模型 链接:https://arxiv.org/abs/2112.03807

作者:Prasanna Parasurama 机构:New York University 备注:See this http URL 摘要:本文介绍了raceBERT——一个基于转换器的模型,用于从名称中的字符序列预测种族,以及一个附带的python包。使用在美国佛罗里达州选民登记数据集上训练的基于Transformer的模型,该模型预测一个名字属于5个美国人口普查种族类别(白人、黑人、西班牙裔、亚太岛民、美洲印第安人和阿拉斯加土著)的可能性。我以Sood和Laohaprapanon(2018)为基础,将他们的LSTM模型替换为基于Transformer的模型(预训练的BERT模型和从头训练的roBERTa模型),并比较结果。据我所知,raceBERT在使用姓名进行比赛预测方面取得了最先进的成绩,f1平均成绩为0.86分,比之前的最先进水平提高了4.1%,非白人姓名的成绩提高了15-17%。 摘要:This paper presents raceBERT -- a transformer-based model for predicting race from character sequences in names, and an accompanying python package. Using a transformer-based model trained on a U.S. Florida voter registration dataset, the model predicts the likelihood of a name belonging to 5 U.S. census race categories (White, Black, Hispanic, Asian & Pacific Islander, American Indian & Alaskan Native). I build on Sood and Laohaprapanon (2018) by replacing their LSTM model with transformer-based models (pre-trained BERT model, and a roBERTa model trained from scratch), and compare the results. To the best of my knowledge, raceBERT achieves state-of-the-art results in race prediction using names, with an average f1-score of 0.86 -- a 4.1% improvement over the previous state-of-the-art, and improvements between 15-17\% for non-white names.

GAN|对抗|攻击|生成相关(9篇)

【1】 Generation of Non-Deterministic Synthetic Face Datasets Guided by Identity Priors 标题:基于身份先验的非确定性合成人脸数据集的生成 链接:https://arxiv.org/abs/2112.03632

作者:Marcel Grimmer,Haoyu Zhang,Raghavendra Ramachandra,Kiran Raja,Christoph Busch 机构: NBL - Norwegian Biometrics Laboratory, NTNU, Norway, dasec - Biometrics and Internet Securiy Research Group, HDA, Germany 备注:None 摘要:通过人脸识别实现高度安全的应用(如越境)需要通过大规模数据进行广泛的生物特征性能测试。然而,使用真实的人脸图像会引起人们对隐私的担忧,因为法律不允许将这些图像用于最初计划之外的其他目的。使用具有代表性的人脸数据和人脸数据子集也可能导致不必要的人口统计偏差,并导致数据集的不平衡。克服这些问题的一个可能的解决方案是用合成生成的样本替换真实的人脸图像。虽然生成合成图像得益于计算机视觉的最新进展,但生成具有类似真实世界变化的相同合成身份的多个样本(即配对样本)仍然没有得到解决。本文提出了一种利用StyleGAN结构良好的潜在空间生成匹配人脸图像的非确定性方法。通过操纵潜在向量生成匹配样本,更准确地说,我们利用主成分分析(PCA)在潜在空间中定义语义上有意义的方向,并使用预训练的人脸识别系统控制原始样本和匹配样本之间的相似性。我们创建了一个新的合成人脸图像数据集(SymFace),该数据集由77034个样本组成,其中包括25919个合成ID。通过使用成熟的人脸图像质量指标进行分析,我们展示了模拟真实生物特征数据特征的合成样本在生物特征质量方面的差异。分析及其结果表明,使用所提议的方法创建的合成样本作为替代真实生物测定数据的可行替代方案。 摘要:Enabling highly secure applications (such as border crossing) with face recognition requires extensive biometric performance tests through large scale data. However, using real face images raises concerns about privacy as the laws do not allow the images to be used for other purposes than originally intended. Using representative and subsets of face data can also lead to unwanted demographic biases and cause an imbalance in datasets. One possible solution to overcome these issues is to replace real face images with synthetically generated samples. While generating synthetic images has benefited from recent advancements in computer vision, generating multiple samples of the same synthetic identity resembling real-world variations is still unaddressed, i.e., mated samples. This work proposes a non-deterministic method for generating mated face images by exploiting the well-structured latent space of StyleGAN. Mated samples are generated by manipulating latent vectors, and more precisely, we exploit Principal Component Analysis (PCA) to define semantically meaningful directions in the latent space and control the similarity between the original and the mated samples using a pre-trained face recognition system. We create a new dataset of synthetic face images (SymFace) consisting of 77,034 samples including 25,919 synthetic IDs. Through our analysis using well-established face image quality metrics, we demonstrate the differences in the biometric quality of synthetic samples mimicking characteristics of real biometric data. The analysis and results thereof indicate the use of synthetic samples created using the proposed approach as a viable alternative to replacing real biometric data.

【2】 Membership Inference Attacks From First Principles 标题:基于第一性原理的隶属度推理攻击 链接:https://arxiv.org/abs/2112.03570

作者:Nicholas Carlini,Steve Chien,Milad Nasr,Shuang Song,Andreas Terzis,Florian Tramer 机构:Florian Tramer, Google Research, University of Massachusetts Amherst 摘要:成员资格推理攻击允许对手查询经过训练的机器学习模型,以预测模型的训练数据集中是否包含特定示例。这些攻击目前使用平均案例“准确性”指标进行评估,该指标无法描述攻击是否能够自信地识别训练集的任何成员。我们认为,应该通过计算低(例如,<0.1%)假阳性率下的真实阳性率来评估攻击,并发现以前的大多数攻击在以这种方式评估时表现不佳。为了解决这个问题,我们开发了一种似然比攻击(LiRA),它仔细地结合了文献中的多种观点。我们的攻击在较低的误报率下比以前强大10倍,并且严格控制了对现有指标的先前攻击。 摘要:A membership inference attack allows an adversary to query a trained machine learning model to predict whether or not a particular example was contained in the model's training dataset. These attacks are currently evaluated using average-case "accuracy" metrics that fail to characterize whether the attack can confidently identify any members of the training set. We argue that attacks should instead be evaluated by computing their true-positive rate at low (e.g., <0.1%) false-positive rates, and find most prior attacks perform poorly when evaluated in this way. To address this we develop a Likelihood Ratio Attack (LiRA) that carefully combines multiple ideas from the literature. Our attack is 10x more powerful at low false-positive rates, and also strictly dominates prior attacks on existing metrics.

【3】 A Generic Approach for Enhancing GANs by Regularized Latent Optimization 标题:正则化潜在优化增强遗传算法的一种通用方法 链接:https://arxiv.org/abs/2112.03502

作者:Yufan Zhou,Chunyuan Li,Changyou Chen,Jinhui Xu 机构:State University of New York at Buffalo, Microsoft Research, Redmond 摘要:随着模型复杂性和数据量的快速增长,训练深层生成模型(deepgenerativemodels,DGMs)以获得更好的性能已成为越来越重要的挑战。以前关于这个问题的研究主要集中在通过引入新的目标函数或设计更具表现力的模型体系结构来改进DGMs。然而,这种方法通常会引入更多的计算和/或设计开销。为了解决这些问题,我们在本文中介绍了一个名为{em生成模型推理}的通用框架,该框架能够在各种应用场景中有效地无缝地增强预先训练的GAN。我们的基本思想是使用Wasserstein梯度流技术有效地推断给定需求的最佳潜在分布,而不是重新训练或微调预先训练的模型参数。在图像生成、图像翻译、文本图像生成、图像修复和文本引导图像编辑等应用上的大量实验结果表明了我们提出的框架的有效性和优越性。 摘要:With the rapidly growing model complexity and data volume, training deep generative models (DGMs) for better performance has becoming an increasingly more important challenge. Previous research on this problem has mainly focused on improving DGMs by either introducing new objective functions or designing more expressive model architectures. However, such approaches often introduce significantly more computational and/or designing overhead. To resolve such issues, we introduce in this paper a generic framework called {em generative-model inference} that is capable of enhancing pre-trained GANs effectively and seamlessly in a variety of application scenarios. Our basic idea is to efficiently infer the optimal latent distribution for the given requirements using Wasserstein gradient flow techniques, instead of re-training or fine-tuning pre-trained model parameters. Extensive experimental results on applications like image generation, image translation, text-to-image generation, image inpainting, and text-guided image editing suggest the effectiveness and superiority of our proposed framework.

【4】 Generative Adversarial Networks for Labeled Data Creation for Structural Damage Detection 标题:用于结构损伤检测标签数据生成的生成式对抗性网络 链接:https://arxiv.org/abs/2112.03478

作者:Furkan Luleci,F. Necati Catbas,Onur Avci 机构:Department of Civil, Environmental, and Construction Engineering, University of Central Florida, Orlando, FL, USA, Department of Civil, Construction, and Environmental Engineering, Iowa State University, Ames, IA 摘要:在过去的几十年里,数据科学领域取得了巨大的进步,其他学科也不断从中受益。结构健康监测(SHM)是利用人工智能(AI)如机器学习(ML)和深度学习(DL)算法,根据收集的数据对土木结构进行状态评估的领域之一。ML和DL方法需要大量数据用于训练程序;然而,在SHM中,从土建结构收集的数据非常详尽;尤其是获取有用的数据(与损伤相关的数据)可能非常具有挑战性。本文采用梯度惩罚的一维瓦瑟斯坦深度卷积生成对抗网络(1-D WDCGAN-GP)生成综合标记振动数据。然后,利用1-D深卷积神经网络(1-D DCNN)对不同水平的综合增强振动数据集进行结构损伤检测。损伤检测结果表明,一维WDCGAN-GP可以成功地解决基于振动的土木结构损伤诊断中的数据不足问题。关键词:结构健康监测(SHM)、结构损伤诊断、结构损伤检测、1-D深卷积神经网络(1-D DCNN)、1-D生成对抗网络(1-D GAN)、深卷积生成对抗网络(DCGAN)、带梯度惩罚的Wasserstein生成对抗网络(WGAN-GP) 摘要:There has been a drastic progression in the field of Data Science in the last few decades and other disciplines have been continuously benefitting from it. Structural Health Monitoring (SHM) is one of those fields that use Artificial Intelligence (AI) such as Machine Learning (ML) and Deep Learning (DL) algorithms for condition assessment of civil structures based on the collected data. The ML and DL methods require plenty of data for training procedures; however, in SHM, data collection from civil structures is very exhaustive; particularly getting useful data (damage associated data) can be very challenging. This paper uses 1-D Wasserstein Deep Convolutional Generative Adversarial Networks using Gradient Penalty (1-D WDCGAN-GP) for synthetic labeled vibration data generation. Then, implements structural damage detection on different levels of synthetically enhanced vibration datasets by using 1-D Deep Convolutional Neural Network (1-D DCNN). The damage detection results show that the 1-D WDCGAN-GP can be successfully utilized to tackle data scarcity in vibration-based damage diagnostics of civil structures. Keywords: Structural Health Monitoring (SHM), Structural Damage Diagnostics, Structural Damage Detection, 1-D Deep Convolutional Neural Networks (1-D DCNN), 1-D Generative Adversarial Networks (1-D GAN), Deep Convolutional Generative Adversarial Networks (DCGAN), Wasserstein Generative Adversarial Networks with Gradient Penalty (WGAN-GP)

【5】 BDFA: A Blind Data Adversarial Bit-flip Attack on Deep Neural Networks 标题:BDFA:一种基于深度神经网络的盲数据对抗性比特翻转攻击 链接:https://arxiv.org/abs/2112.03477

作者:Behnam Ghavami,Mani Sadati,Mohammad Shahidzadeh,Zhenman Fang,Lesley Shannon 机构:Simon Fraser University, Burnaby, BC, Canada 摘要:针对神经网络权重的对抗性位翻转攻击(BFA)通过翻转极少量的位,会导致灾难性的精度下降。先前的位翻转攻击技术的一个主要缺点是它们依赖于测试数据。对于包含敏感或专有数据的应用程序,这通常是不可能的。在本文中,我们提出了盲数据对抗比特翻转攻击(BDFA),这是一种使BFA无需访问训练或测试数据的新技术。这是通过优化合成数据集来实现的,该数据集的设计目的是匹配网络不同层和目标标签之间的批量标准化统计数据。实验结果表明,BDFA仅需4位翻转,就能将ResNet50的准确率从75.96%显著降低到13.94%。 摘要:Adversarial bit-flip attack (BFA) on Neural Network weights can result in catastrophic accuracy degradation by flipping a very small number of bits. A major drawback of prior bit flip attack techniques is their reliance on test data. This is frequently not possible for applications that contain sensitive or proprietary data. In this paper, we propose Blind Data Adversarial Bit-flip Attack (BDFA), a novel technique to enable BFA without any access to the training or testing data. This is achieved by optimizing for a synthetic dataset, which is engineered to match the statistics of batch normalization across different layers of the network and the targeted label. Experimental results show that BDFA could decrease the accuracy of ResNet50 significantly from 75.96\% to 13.94\% with only 4 bits flips.

【6】 Top-Down Deep Clustering with Multi-generator GANs 标题:基于多生成GAN的自上而下深度聚类 链接:https://arxiv.org/abs/2112.03398

作者:Daniel de Mello,Renato Assunção,Fabricio Murai 机构: Department of Computer Science, Universidade Federal de Minas Gerais, ESRI Inc. 备注:Accepted to AAAI 2021 摘要:深度集群(DC)利用深度体系结构的表示能力来学习最适合集群分析的嵌入空间。这种方法过滤出与聚类无关的低级信息,并在高维数据空间中取得了显著的成功。一些DC方法采用生成性对抗网络(GAN),受这些模型能够隐式学习的强大潜在表示的激励。在这项工作中,我们提出了HC-MGAN,这是一种基于具有多个生成器的GANs(MGAN)的新技术,该技术尚未用于聚类。我们的方法的灵感来自这样一个观察:MGAN的每个生成器都倾向于生成与真实数据分布的子区域相关的数据。我们使用该聚类生成来训练分类器,以推断给定图像来自哪个生成者,从而为真实分布提供语义上有意义的聚类。此外,我们设计了我们的方法,使其在自上而下的层次聚类树中执行,从而提出了第一个层次DC方法,据我们所知。我们进行了几次实验来评估所提出的方法与最近的DC方法,获得了有竞争力的结果。最后,我们对层次聚类树进行了探索性分析,重点介绍了它在语义一致模式的层次结构中如何准确地组织数据。 摘要:Deep clustering (DC) leverages the representation power of deep architectures to learn embedding spaces that are optimal for cluster analysis. This approach filters out low-level information irrelevant for clustering and has proven remarkably successful for high dimensional data spaces. Some DC methods employ Generative Adversarial Networks (GANs), motivated by the powerful latent representations these models are able to learn implicitly. In this work, we propose HC-MGAN, a new technique based on GANs with multiple generators (MGANs), which have not been explored for clustering. Our method is inspired by the observation that each generator of a MGAN tends to generate data that correlates with a sub-region of the real data distribution. We use this clustered generation to train a classifier for inferring from which generator a given image came from, thus providing a semantically meaningful clustering for the real distribution. Additionally, we design our method so that it is performed in a top-down hierarchical clustering tree, thus proposing the first hierarchical DC method, to the best of our knowledge. We conduct several experiments to evaluate the proposed method against recent DC methods, obtaining competitive results. Last, we perform an exploratory analysis of the hierarchical clustering tree that highlights how accurately it organizes the data in a hierarchy of semantically coherent patterns.

【7】 Adversarial Machine Learning In Network Intrusion Detection Domain: A Systematic Review 标题:网络入侵检测领域的对抗性机器学习研究综述 链接:https://arxiv.org/abs/2112.03315

作者:Huda Ali Alatwi,Charles Morisset 机构:Newcastle University, UK, Tabuk University, KSA 摘要:由于深度学习技术在各个领域取得了巨大的成功,因此它越来越多地被用于设计网络入侵检测解决方案,以高准确率和最小的特征工程检测和缓解未知和已知攻击。然而,已经发现,深度学习模型容易受到数据实例的影响,这些数据实例可能误导模型做出错误的分类决策,即所谓的(对抗性示例)。此类漏洞允许攻击者通过向恶意流量中添加小的狡猾干扰来攻击NIDS,以逃避检测并破坏系统的关键功能。深度对抗学习问题在计算机视觉领域得到了广泛的研究;然而,在网络安全应用中,它仍然是一个开放的研究领域。因此,本综述探讨了在网络入侵检测领域中采用对抗式机器学习的不同方面的研究,以便为潜在的解决方案提供方向。首先,根据它们在生成对抗性示例、评估基于ML的NID对对抗性示例的鲁棒性以及保护这些模型免受此类攻击方面的贡献,对调查研究进行分类。其次,我们强调调查研究中确定的特征。此外,我们还讨论了现有通用对抗性攻击在NIDS领域的适用性、在现实场景中发起拟议攻击的可行性以及现有缓解方案的局限性。 摘要:Due to their massive success in various domains, deep learning techniques are increasingly used to design network intrusion detection solutions that detect and mitigate unknown and known attacks with high accuracy detection rates and minimal feature engineering. However, it has been found that deep learning models are vulnerable to data instances that can mislead the model to make incorrect classification decisions so-called (adversarial examples). Such vulnerability allows attackers to target NIDSs by adding small crafty perturbations to the malicious traffic to evade detection and disrupt the system's critical functionalities. The problem of deep adversarial learning has been extensively studied in the computer vision domain; however, it is still an area of open research in network security applications. Therefore, this survey explores the researches that employ different aspects of adversarial machine learning in the area of network intrusion detection in order to provide directions for potential solutions. First, the surveyed studies are categorized based on their contribution to generating adversarial examples, evaluating the robustness of ML-based NIDs towards adversarial examples, and defending these models against such attacks. Second, we highlight the characteristics identified in the surveyed research. Furthermore, we discuss the applicability of the existing generic adversarial attacks for the NIDS domain, the feasibility of launching the proposed attacks in real-world scenarios, and the limitations of the existing mitigation solutions.

【8】 Synthetic ECG Signal Generation Using Generative Neural Networks 标题:基于产生式神经网络的合成心电信号生成 链接:https://arxiv.org/abs/2112.03268

作者:Edmond Adib,Fatemeh Afghah,John J. Prevost 机构: Electrical and Computer Engineering Department, University of Texas at San Antonio, (UTSA), San Antonio, TX, USA, Department of Electrical and Computer Engineering, Clemson University, Clemson, SC, USA 摘要:由于缺乏异常病例,心电图(ECG)数据集往往高度不平衡。此外,由于隐私问题,真实患者心电图的使用受到高度管制。因此,总是需要更多的ECG数据,尤其是训练自动诊断机器学习模型,当在平衡数据集上训练时,这些模型的性能更好。我们研究了生成性对抗网络(GAN)家族中5种不同模型的综合心电图生成能力,并比较了它们的性能,重点是正常心动周期。采用动态时间扭曲(DTW)、Fr¨echet和欧几里德距离函数对性能进行定量测量。提出并应用了五种不同的方法来评估生成的节拍。我们还提出了3个新概念(阈值、可接受节拍和生产率),并将其与上述方法结合使用,作为模型间比较的系统方法。结果表明,所有被测试的模型都能在一定程度上成功地大量生成形态特征高度相似的可接受的心跳,并且所有这些模型都有可能用于扩充不平衡的数据集。然而,对产生的节拍进行目视检查有利于BiLSTM DC GAN和WGAN,因为它们产生的节拍在统计上更可接受。此外,就生产率而言,经典GAN的生产率高达72%。 摘要:Electrocardiogram (ECG) datasets tend to be highly imbalanced due to the scarcity of abnormal cases. Additionally, the use of real patients' ECG is highly regulated due to privacy issues. Therefore, there is always a need for more ECG data, especially for the training of automatic diagnosis machine learning models, which perform better when trained on a balanced dataset. We studied the synthetic ECG generation capability of 5 different models from the generative adversarial network (GAN) family and compared their performances, the focus being only on Normal cardiac cycles. Dynamic Time Warping (DTW), Fr'echet, and Euclidean distance functions were employed to quantitatively measure performance. Five different methods for evaluating generated beats were proposed and applied. We also proposed 3 new concepts (threshold, accepted beat and productivity rate) and employed them along with the aforementioned methods as a systematic way for comparison between models. The results show that all the tested models can to an extent successfully mass-generate acceptable heartbeats with high similarity in morphological features, and potentially all of them can be used to augment imbalanced datasets. However, visual inspections of generated beats favor BiLSTM-DC GAN and WGAN, as they produce statistically more acceptable beats. Also, with regards to productivity rate, the Classic GAN is superior with a 72% productivity rate.

【9】 Contrastive Cycle Adversarial Autoencoders for Single-cell Multi-omics Alignment and Integration 标题:用于单细胞多组学比对和整合的对比循环对抗性自动编码器 链接:https://arxiv.org/abs/2112.03266

作者:Xuesong Wang,Zhihang Hu,Tingyang Yu,Ruijie Wang,Yumeng Wei,Juan Shu,Jianzhu Ma,Yu Li 机构:Department of Computer Science and Engineering, CUHK, Hong Kong SAR, China, The CUHK Shenzhen Research Institute, Hi-Tech Park, Nanshan, Shenzhen, China, Purdue University, West Lafayette, IN , United States 摘要:Muilti模态数据在生物学中无处不在,特别是我们已经进入了多组学时代,我们可以从不同方面(组学)测量相同的生物对象(细胞),以提供对细胞系统更全面的了解。在处理此类多组数据时,第一步是确定不同模式之间的对应关系。换句话说,我们应该匹配来自同一对象对应的不同空间的数据。这一问题在单细胞多组学场景中尤其具有挑战性,因为此类数据非常稀疏且维度极高。其次,匹配的单细胞多组学数据稀少且难以收集。此外,由于实验环境的限制,数据通常是高噪声的。为了促进单细胞多组学研究,我们克服了上述挑战,提出了一个新的框架来整合单细胞RNA-seq数据和单细胞ATAC-seq数据。我们的方法可以有效地将上述数据从不同的空间映射到统一空间中的低维流形,从而简化下游对齐和集成。与其他最先进的方法相比,我们的方法在模拟和真实的单细胞数据中都表现得更好。该方法有助于单细胞多组学的研究。对模拟数据进行集成的改进意义重大。 摘要:Muilti-modality data are ubiquitous in biology, especially that we have entered the multi-omics era, when we can measure the same biological object (cell) from different aspects (omics) to provide a more comprehensive insight into the cellular system. When dealing with such multi-omics data, the first step is to determine the correspondence among different modalities. In other words, we should match data from different spaces corresponding to the same object. This problem is particularly challenging in the single-cell multi-omics scenario because such data are very sparse with extremely high dimensions. Secondly, matched single-cell multi-omics data are rare and hard to collect. Furthermore, due to the limitations of the experimental environment, the data are usually highly noisy. To promote the single-cell multi-omics research, we overcome the above challenges, proposing a novel framework to align and integrate single-cell RNA-seq data and single-cell ATAC-seq data. Our approach can efficiently map the above data with high sparsity and noise from different spaces to a low-dimensional manifold in a unified space, making the downstream alignment and integration straightforward. Compared with the other state-of-the-art methods, our method performs better in both simulated and real single-cell data. The proposed method is helpful for the single-cell multi-omics research. The improvement for integration on the simulated data is significant.

半/弱/无/有监督|不确定性|主动学习(7篇)

【1】 Universalizing Weak Supervision 标题:普遍推行弱监督 链接:https://arxiv.org/abs/2112.03865

作者:Changho Shin,Winfred Li,Harit Vishwakarma,Nicholas Roberts,Frederic Sala 机构:†Department of Computer Sciences, University of Wisconsin-Madison 摘要:弱监督(WS)框架是一种流行的方法,可以绕过手动标记大型数据集来训练数据饥饿模型。这些方法将多个噪声但廉价获得的标签估计合成为一组高质量的伪标签,用于下游训练。然而,合成技术特定于特定种类的标签,例如二进制标签或序列,并且每种新标签类型都需要手动设计新的合成算法。相反,我们提出了一种通用技术,该技术能够对任何标签类型进行弱监督,同时仍然提供理想的特性,包括实用灵活性、计算效率和理论保证。我们将此技术应用于以前WS框架没有解决的重要问题,包括学习排序、回归和学习双曲流形。理论上,我们的综合方法为学习指数族模型的一个具有挑战性但重要的推广提供了一致估计。在实验上,我们验证了我们的框架,并在不同的环境下显示了基线的改进,包括真实世界的排序和回归问题学习以及双曲流形学习。 摘要:Weak supervision (WS) frameworks are a popular way to bypass hand-labeling large datasets for training data-hungry models. These approaches synthesize multiple noisy but cheaply-acquired estimates of labels into a set of high-quality pseudolabels for downstream training. However, the synthesis technique is specific to a particular kind of label, such as binary labels or sequences, and each new label type requires manually designing a new synthesis algorithm. Instead, we propose a universal technique that enables weak supervision over any label type while still offering desirable properties, including practical flexibility, computational efficiency, and theoretical guarantees. We apply this technique to important problems previously not tackled by WS frameworks including learning to rank, regression, and learning in hyperbolic manifolds. Theoretically, our synthesis approach produces a consistent estimator for learning a challenging but important generalization of the exponential family model. Experimentally, we validate our framework and show improvement over baselines in diverse settings including real-world learning-to-rank and regression problems along with learning on hyperbolic manifolds.

【2】 Creating Multimodal Interactive Agents with Imitation and Self-Supervised Learning 标题:用模仿和自监督学习创建多通道交互Agent 链接:https://arxiv.org/abs/2112.03763

作者:DeepMind Interactive Agents Team,Josh Abramson,Arun Ahuja,Arthur Brussee,Federico Carnevale,Mary Cassin,Felix Fischer,Petko Georgiev,Alex Goldin,Tim Harley,Felix Hill,Peter C Humphreys,Alden Hung,Jessica Landon,Timothy Lillicrap,Hamza Merzic,Alistair Muldal,Adam Santoro,Guy Scully,Tamara von Glehn,Greg Wayne,Nathaniel Wong,Chen Yan,Rui Zhu 摘要:科幻小说中的一个共同愿景是机器人有朝一日将居住在我们的物理空间中,像我们一样感知世界,协助我们的体力劳动,并通过自然语言与我们交流。在这里,我们研究如何通过简化虚拟环境来设计能够与人类自然交互的人工智能体。我们表明,在模拟世界中对人与人之间的交互进行模仿学习,再加上自我监督学习,足以产生一个多模态交互代理,我们称之为MIA,它在75%的时间内成功地与非对抗性人类交互。我们进一步确定了提高性能的体系结构和算法技术,如分层动作选择。总之,我们的研究结果表明,模拟多模态实时人类行为可以提供一种直接且出人意料的有效方法,向代理人灌输丰富的行为先验知识,然后代理人可以根据特定目的进行微调,从而为交互式机器人或数字助理的能力训练奠定了基础。MIA行为的视频可在https://youtu.be/ZFgRhviF7mY 摘要:A common vision from science fiction is that robots will one day inhabit our physical spaces, sense the world as we do, assist our physical labours, and communicate with us through natural language. Here we study how to design artificial agents that can interact naturally with humans using the simplification of a virtual environment. We show that imitation learning of human-human interactions in a simulated world, in conjunction with self-supervised learning, is sufficient to produce a multimodal interactive agent, which we call MIA, that successfully interacts with non-adversarial humans 75% of the time. We further identify architectural and algorithmic techniques that improve performance, such as hierarchical action selection. Altogether, our results demonstrate that imitation of multi-modal, real-time human behaviour may provide a straightforward and surprisingly effective means of imbuing agents with a rich behavioural prior from which agents might then be fine-tuned for specific purposes, thus laying a foundation for training capable agents for interactive robots or digital assistants. A video of MIA's behaviour may be found at https://youtu.be/ZFgRhviF7mY

【3】 Unsupervised Learning of Compositional Scene Representations from Multiple Unspecified Viewpoints 标题:多视点构图场景表示的无监督学习 链接:https://arxiv.org/abs/2112.03568

作者:Jinyang Yuan,Bin Li,Xiangyang Xue 备注:AAAI 2022 摘要:视觉场景具有极其丰富的多样性,这不仅是因为存在无限多的对象和背景组合,而且还因为同一场景的观测值可能会随着视点的变化而发生很大变化。当从多个视点观察包含多个对象的视觉场景时,人类能够从每个视点以合成方式感知场景,同时在不同视点之间实现所谓的“对象恒定性”,即使确切的视点不详。这种能力对于人类在移动时识别同一物体以及有效地从视觉中学习至关重要。设计具有类似能力的模型很有趣。在本文中,我们考虑了从多个未指定的视点学习合成场景表示的新问题,而不使用任何监督,并提出了一种深度生成模型,该模型将潜在表示分离为视点无关部分和视点依赖部分来解决该问题。为了推断潜在表征,不同视点中包含的信息通过神经网络进行迭代集成。在几个专门设计的合成数据集上的实验表明,该方法能够有效地从多个未指定的视点进行学习。 摘要:Visual scenes are extremely rich in diversity, not only because there are infinite combinations of objects and background, but also because the observations of the same scene may vary greatly with the change of viewpoints. When observing a visual scene that contains multiple objects from multiple viewpoints, humans are able to perceive the scene in a compositional way from each viewpoint, while achieving the so-called "object constancy" across different viewpoints, even though the exact viewpoints are untold. This ability is essential for humans to identify the same object while moving and to learn from vision efficiently. It is intriguing to design models that have the similar ability. In this paper, we consider a novel problem of learning compositional scene representations from multiple unspecified viewpoints without using any supervision, and propose a deep generative model which separates latent representations into a viewpoint-independent part and a viewpoint-dependent part to solve this problem. To infer latent representations, the information contained in different viewpoints is iteratively integrated by neural networks. Experiments on several specifically designed synthetic datasets have shown that the proposed method is able to effectively learn from multiple unspecified viewpoints.

【4】 More layers! End-to-end regression and uncertainty on tabular data with deep learning 标题:更多层!基于深度学习的表格数据端到端回归与不确定性 链接:https://arxiv.org/abs/2112.03566

作者:Ivan Bondarenko 机构: Novosibirsk State University, Huawei Novosibirsk Research Center, Novosibirsk, Russia 备注:12 pages, 5 figures, the described solution is submitted to the Shifts Challenge (see this https URL), the code is available on this https URL 摘要:本文试图分析深度学习在表格数据处理中的有效性。人们认为,决策树及其集成是这一领域的主流方法,而深度神经网络必须满足于计算机视觉等。但深层神经网络是一个构建基于梯度的层次表示的框架,这个关键功能应该能够提供对一般结构化(表格)数据的最佳处理,而不仅仅是图像矩阵和音频频谱图。这个问题是通过Yandex移位挑战(换句话说,Yandex移位天气任务)中天气预测轨迹的棱镜来考虑的。此任务是经典表格数据回归问题的变体。它还与另一个重要问题有关:机器学习中的泛化和不确定性。本文提出了一种解决表格数据不确定性回归问题的端到端算法,该算法基于以下四个思想:1)自规范化神经网络的深度集成,2)作为高斯目标误差分布参数估计的回归,3)分层多任务学习,4)简单的数据预处理。该算法的三个修改分别构成Yandex移动天气挑战赛的前三名排行榜。本文认为这种成功是由于深度学习算法的基本特性,并试图证明这一点。 摘要:This paper attempts to analyze the effectiveness of deep learning for tabular data processing. It is believed that decision trees and their ensembles is the leading method in this domain, and deep neural networks must be content with computer vision and so on. But the deep neural network is a framework for building gradient-based hierarchical representations, and this key feature should be able to provide the best processing of generic structured (tabular) data, not just image matrices and audio spectrograms. This problem is considered through the prism of the Weather Prediction track in the Yandex Shifts challenge (in other words, the Yandex Shifts Weather task). This task is a variant of the classical tabular data regression problem. It is also connected with another important problem: generalization and uncertainty in machine learning. This paper proposes an end-to-end algorithm for solving the problem of regression with uncertainty on tabular data, which is based on the combination of four ideas: 1) deep ensemble of self-normalizing neural networks, 2) regression as parameter estimation of the Gaussian target error distribution, 3) hierarchical multitask learning, and 4) simple data preprocessing. Three modifications of the proposed algorithm form the top-3 leaderboard of the Yandex Shifts Weather challenge respectively. This paper considers that this success has occurred due to the fundamental properties of the deep learning algorithm, and tries to prove this.

【5】 Accurate parameter estimation using scan-specific unsupervised deep learning for relaxometry and MR fingerprinting 标题:使用特定于扫描的无监督深度学习进行弛豫测量和MR指纹的精确参数估计 链接:https://arxiv.org/abs/2112.03815

作者:Mengze Gao,Huihui Ye,Tae Hyung Kim,Zijing Zhang,Seohee So,Berkin Bilgic 机构:Department of Precision Instrument, Tsinghua University, Beijing, China, State Key Laboratory of Modern Optical Instrumentation, College of Optical Science and, Engineering, Zhejiang University, Hangzhou, China, Harvard Medical School, Boston, MA, USA 备注:7 pages, 5 figures, submitted to International Society for Magnetic Resonance in Medicine 2022 摘要:我们提出了一种用于松弛参数估计的无监督卷积神经网络(CNN)。该网络结合了信号松弛和布洛赫模拟,同时利用了剩余学习和相邻体素之间的空间关系。在多回波T2和T2*标测的数值模拟和活体数据中,与标准参数估计方法相比,量化精度和对噪声的鲁棒性显著提高。所提出的网络与子空间建模和来自高度欠采样数据的MR指纹(MRF)相结合允许高质量T1和T2映射。 摘要:We propose an unsupervised convolutional neural network (CNN) for relaxation parameter estimation. This network incorporates signal relaxation and Bloch simulations while taking advantage of residual learning and spatial relations across neighboring voxels. Quantification accuracy and robustness to noise is shown to be significantly improved compared to standard parameter estimation methods in numerical simulations and in vivo data for multi-echo T2 and T2* mapping. The combination of the proposed network with subspace modeling and MR fingerprinting (MRF) from highly undersampled data permits high quality T1 and T2 mapping.

【6】 Noise Distribution Adaptive Self-Supervised Image Denoising using Tweedie Distribution and Score Matching 标题:基于Twedie分布和分数匹配的噪声分布自适应自监督图像去噪 链接:https://arxiv.org/abs/2112.03696

作者:Kwanyoung Kim,Taesung Kwon,Jong Chul Ye 机构: Department of Bio and Brain Engineering, Kim Jaechul Graduate School of AI, Deptartment of Mathematical Sciences, Korea Advanced Institute of Science and Technology (KAIST) 摘要:Tweedie分布是指数色散模型的一种特例,在经典统计学中常用作广义线性模型的分布。在这里,我们揭示了Tweedie分布在现代深度学习时代也起着关键作用,导致了一个独立于分布的自监督图像去噪公式,而没有干净的参考图像。具体地说,通过结合最近的Noise2Score自监督图像去噪方法和Tweedie分布的鞍点近似,我们可以提供一个通用的封闭形式去噪公式,该公式可用于大类噪声分布,而不需要知道潜在的噪声分布。与原始Noise2Score相似,新方法由两个连续步骤组成:使用扰动噪声图像进行分数匹配,然后通过分布无关的Tweedie公式得到封闭形式的图像去噪公式。这也提出了一个系统的算法来估计噪声模型和噪声参数为给定的噪声图像数据集。通过大量实验,我们证明了该方法能够准确估计噪声模型和参数,并在基准数据集和真实数据集上提供了最先进的自监督图像去噪性能。 摘要:Tweedie distributions are a special case of exponential dispersion models, which are often used in classical statistics as distributions for generalized linear models. Here, we reveal that Tweedie distributions also play key roles in modern deep learning era, leading to a distribution independent self-supervised image denoising formula without clean reference images. Specifically, by combining with the recent Noise2Score self-supervised image denoising approach and the saddle point approximation of Tweedie distribution, we can provide a general closed-form denoising formula that can be used for large classes of noise distributions without ever knowing the underlying noise distribution. Similar to the original Noise2Score, the new approach is composed of two successive steps: score matching using perturbed noisy images, followed by a closed form image denoising formula via distribution-independent Tweedie's formula. This also suggests a systematic algorithm to estimate the noise model and noise parameters for a given noisy image data set. Through extensive experiments, we demonstrate that the proposed method can accurately estimate noise models and parameters, and provide the state-of-the-art self-supervised image denoising performance in the benchmark dataset and real-world dataset.

【7】 Organ localisation using supervised and semi supervised approaches combining reinforcement learning with imitation learning 标题:结合强化学习和模仿学习的监督和半监督器官定位方法 链接:https://arxiv.org/abs/2112.03276

作者:Sankaran Iyer,Alan Blair,Laughlin Dawes,Daniel Moses,Christopher White,Arcot Sowmya 机构:School of Computer Science and Engineering, University of New South Wales Kensington, NSW , Department of Medical Imaging, Prince of Wales Hospital, NSW, Australia, Department of Endocrinology and Metabolism, Prince of Wales Hospital, NSW, Australia 备注:16 pages, 12 figures 摘要:计算机辅助诊断通常需要在放射学扫描中分析感兴趣区域(ROI),ROI可能是一个器官或亚器官。尽管深度学习算法的性能优于其他方法,但它们依赖于大量注释数据的可用性。出于解决这一局限性的需要,本文提出了一种基于监督和半监督学习的多器官定位和检测方法。它借鉴了作者先前在CT图像中定位胸椎和腰椎区域的工作。该方法生成感兴趣器官的六个边界框,然后将它们融合到一个边界框中。使用监督和半监督学习(SSL)对CT图像中的脾脏、左肾和右肾进行定位的实验结果表明,与其他最先进的方法相比,使用更小的数据集和更少的注释可以解决数据限制问题。使用三种不同的标记和未标记数据(即30:70、35:65、40:60)分别对腰椎、脾脏、左肾和右肾的SSL性能进行评估。结果表明,SSL提供了一种可行的替代方案,特别是在难以获得注释数据的医学成像中。 摘要:Computer aided diagnostics often requires analysis of a region of interest (ROI) within a radiology scan, and the ROI may be an organ or a suborgan. Although deep learning algorithms have the ability to outperform other methods, they rely on the availability of a large amount of annotated data. Motivated by the need to address this limitation, an approach to localisation and detection of multiple organs based on supervised and semi-supervised learning is presented here. It draws upon previous work by the authors on localising the thoracic and lumbar spine region in CT images. The method generates six bounding boxes of organs of interest, which are then fused to a single bounding box. The results of experiments on localisation of the Spleen, Left and Right Kidneys in CT Images using supervised and semi supervised learning (SSL) demonstrate the ability to address data limitations with a much smaller data set and fewer annotations, compared to other state-of-the-art methods. The SSL performance was evaluated using three different mixes of labelled and unlabelled data (i.e.30:70,35:65,40:60) for each of lumbar spine, spleen left and right kidneys respectively. The results indicate that SSL provides a workable alternative especially in medical imaging where it is difficult to obtain annotated data.

迁移|Zero/Few/One-Shot|自适应(2篇)

【1】 MESA: Offline Meta-RL for Safe Adaptation and Fault Tolerance 标题:MESA:用于安全适配和容错的离线Meta-RL 链接:https://arxiv.org/abs/2112.03575

作者:Michael Luo,Ashwin Balakrishna,Brijen Thananjeyan,Suraj Nair,Julian Ibarz,Jie Tan,Chelsea Finn,Ion Stoica,Ken Goldberg 备注:None 摘要:安全探索对于在风险敏感环境中使用强化学习(RL)至关重要。最近的工作学习了风险度量,它度量违反约束的概率,然后可用于实现安全。然而,学习此类风险度量需要与环境进行大量交互,从而导致学习过程中过度违反约束。此外,这些措施不容易转移到新的环境中。我们将安全探索视为离线meta-RL问题,其目标是利用一系列环境中的安全和不安全行为示例,以快速将学到的风险度量适应具有以前未发现动态的新环境。然后,我们提出了安全适应元学习(MESA),一种元学习方法,一种安全RL的风险度量方法。跨5个连续控制域的模拟实验表明,MESA可以利用一系列不同环境中的脱机数据,在保持任务性能的同时,将看不见环境中的约束冲突减少多达2倍。看见https://tinyurl.com/safe-meta-rl 代码和补充资料。 摘要:Safe exploration is critical for using reinforcement learning (RL) in risk-sensitive environments. Recent work learns risk measures which measure the probability of violating constraints, which can then be used to enable safety. However, learning such risk measures requires significant interaction with the environment, resulting in excessive constraint violations during learning. Furthermore, these measures are not easily transferable to new environments. We cast safe exploration as an offline meta-RL problem, where the objective is to leverage examples of safe and unsafe behavior across a range of environments to quickly adapt learned risk measures to a new environment with previously unseen dynamics. We then propose MEta-learning for Safe Adaptation (MESA), an approach for meta-learning a risk measure for safe RL. Simulation experiments across 5 continuous control domains suggest that MESA can leverage offline data from a range of different environments to reduce constraint violations in unseen environments by up to a factor of 2 while maintaining task performance. See https://tinyurl.com/safe-meta-rl for code and supplementary material.

【2】 Label Hallucination for Few-Shot Classification 标题:给幻觉贴上标签,以便对“Few-Shot”进行分类 链接:https://arxiv.org/abs/2112.03340

作者:Yiren Jian,Lorenzo Torresani 机构:Dartmouth College 备注:Accepted by AAAI 2022. Code is available: this https URL 摘要:少数镜头分类需要调整从大型带注释的基础数据集中学习的知识,以识别新的看不见的类,每个类由少数标记的示例表示。在这种情况下,在大数据集上预训练具有高容量的网络,然后在少数示例上对其进行微调会导致严重的过度拟合。同时,在从大的标记数据集中学习到的“冻结”特征的基础上训练一个简单的线性分类器无法使模型适应新类的属性,从而有效地导致欠拟合。在本文中,我们提出了两种流行策略的替代方法。首先,我们的方法使用在新类上训练的线性分类器对整个大型数据集进行伪标记。这有效地“幻觉”了大型数据集中的新类,尽管新类不存在于基础数据库中(新类和基类是不相交的)。然后,除了新数据集上的标准交叉熵损失外,它还使用伪标记基示例上的蒸馏损失对整个模型进行微调。该步骤有效地训练网络识别上下文和外观线索,这些线索对新类别识别有用,但使用整个大规模基础数据集,从而克服了少数镜头学习固有的数据稀缺问题。尽管该方法简单,但我们表明,我们的方法在四个完善的少数镜头分类基准上优于最先进的方法。 摘要:Few-shot classification requires adapting knowledge learned from a large annotated base dataset to recognize novel unseen classes, each represented by few labeled examples. In such a scenario, pretraining a network with high capacity on the large dataset and then finetuning it on the few examples causes severe overfitting. At the same time, training a simple linear classifier on top of "frozen" features learned from the large labeled dataset fails to adapt the model to the properties of the novel classes, effectively inducing underfitting. In this paper we propose an alternative approach to both of these two popular strategies. First, our method pseudo-labels the entire large dataset using the linear classifier trained on the novel classes. This effectively "hallucinates" the novel classes in the large dataset, despite the novel categories not being present in the base database (novel and base classes are disjoint). Then, it finetunes the entire model with a distillation loss on the pseudo-labeled base examples, in addition to the standard cross-entropy loss on the novel dataset. This step effectively trains the network to recognize contextual and appearance cues that are useful for the novel-category recognition but using the entire large-scale base dataset and thus overcoming the inherent data-scarcity problem of few-shot learning. Despite the simplicity of the approach, we show that that our method outperforms the state-of-the-art on four well-established few-shot classification benchmarks.

强化学习(4篇)

【1】 Attention-Based Model and Deep Reinforcement Learning for Distribution of Event Processing Tasks 标题:基于注意力的事件处理任务分配模型和深度强化学习 链接:https://arxiv.org/abs/2112.03835

作者:A. Mazayev,F. Al-Tam,N. Correia 机构:Centre of Electronics, Optoelectronics and Telecommunications (CEOT), University of Algarve,-, Faro, Portugal, Departement of Computer Science, Aberystwyth University, United Kingdom 备注:19 pages, 6 figures 摘要:事件处理是动态响应物联网(IoT)的基石。该领域的最新方法基于代表性状态转移(REST)原则,该原则允许将事件处理任务放置在遵循相同原则的任何设备上。但是,任务应该在边缘设备之间适当分配,以确保资源的公平利用和无缝执行。本文研究如何使用深度学习来公平分配任务。提出了一种基于注意的神经网络模型,用于在不同场景下生成有效的负载平衡解决方案。该模型基于Transformer和指针网络体系结构,并采用优势角色-批评家强化学习算法进行训练。该模型设计成可扩展到事件处理任务的数量和边缘设备的数量,而无需重新调整超参数甚至重新训练。大量的实验结果表明,该模型在许多关键性能指标上优于传统的启发式算法。通用设计和获得的结果表明,所提出的模型可以潜在地应用于其他几种负载平衡问题变体,这使得该方案由于其可扩展性和效率而成为现实场景中使用的一个有吸引力的选项。 摘要:Event processing is the cornerstone of the dynamic and responsive Internet of Things (IoT). Recent approaches in this area are based on representational state transfer (REST) principles, which allow event processing tasks to be placed at any device that follows the same principles. However, the tasks should be properly distributed among edge devices to ensure fair resources utilization and guarantee seamless execution. This article investigates the use of deep learning to fairly distribute the tasks. An attention-based neural network model is proposed to generate efficient load balancing solutions under different scenarios. The proposed model is based on the Transformer and Pointer Network architectures, and is trained by an advantage actor-critic reinforcement learning algorithm. The model is designed to scale to the number of event processing tasks and the number of edge devices, with no need for hyperparameters re-tuning or even retraining. Extensive experimental results show that the proposed model outperforms conventional heuristics in many key performance indicators. The generic design and the obtained results show that the proposed model can potentially be applied to several other load balancing problem variations, which makes the proposal an attractive option to be used in real-world scenarios due to its scalability and efficiency.

【2】 Godot Reinforcement Learning Agents 标题:戈多强化学习代理 链接:https://arxiv.org/abs/2112.03636

作者:Edward Beeching,Jilles Debangoye,Olivier Simonin,Christian Wolf 机构:INRIA Chroma team, CITI Laboratory. INSA-Lyon, France., Universit´e de Lyon, INSA-Lyon, LIRIS, CNRS, France. 摘要:我们介绍了Godot强化学习(RL)代理,这是一个开源接口,用于开发Godot游戏引擎中的环境和代理。Godot RL agent界面允许在具有挑战性的2D和3D环境中使用各种策略上和策略下的深度RL算法设计、创建和学习agent行为。我们提供了一个标准的健身房界面,带有在Ray RLlib和稳定的基线RL框架中学习的包装器。这允许用户访问超过20种最先进的on-policy、off-policy和multi-agent RL算法。该框架是一个多功能工具,允许研究人员和游戏设计师创建具有离散、连续和混合动作空间的环境。当在4个CPU核上并行时,该接口的性能相对较高,在高端笔记本电脑上每秒可进行12k的交互。此处提供了概述视频:https://youtu.be/g1MlZSFqIj4 摘要:We present Godot Reinforcement Learning (RL) Agents, an open-source interface for developing environments and agents in the Godot Game Engine. The Godot RL Agents interface allows the design, creation and learning of agent behaviors in challenging 2D and 3D environments with various on-policy and off-policy Deep RL algorithms. We provide a standard Gym interface, with wrappers for learning in the Ray RLlib and Stable Baselines RL frameworks. This allows users access to over 20 state of the art on-policy, off-policy and multi-agent RL algorithms. The framework is a versatile tool that allows researchers and game designers the ability to create environments with discrete, continuous and mixed action spaces. The interface is relatively performant, with 12k interactions per second on a high end laptop computer, when parallized on 4 CPU cores. An overview video is available here: https://youtu.be/g1MlZSFqIj4

【3】 Federated Deep Reinforcement Learning for the Distributed Control of NextG Wireless Networks 标题:用于NextG无线网络分布式控制的联邦深度强化学习 链接:https://arxiv.org/abs/2112.03465

作者:Peyman Tehrani,Francesco Restuccia,Marco Levorato 机构:†Donald Bren School of Information and Computer Sciences, University of California at Irvine, United States, ∗Department of Electrical and Computer Engineering, Northeastern University, United States 备注:6pages 摘要:下一代(NextG)网络预计将支持要求苛刻的触觉互联网应用,如增强现实和连接的自动车辆。尽管最近的创新带来了更大链路容量的前景,但它们对环境的敏感性和不稳定的性能违背了传统的基于模型的控制原理。零接触数据驱动方法可以提高网络适应当前运行条件的能力。强化学习(RL)算法等工具可以仅基于观测历史建立最优控制策略。具体地说,使用深度神经网络(DNN)作为预测器的深度RL(DRL)即使在复杂环境和高维输入下也能获得良好的性能。然而,DRL模型的训练需要大量的数据,这可能会限制其对不断变化的底层环境统计数据的适应性。此外,无线网络本质上是分布式系统,集中式DRL方法将需要过度的数据交换,而完全分布式方法可能导致收敛速度较慢和性能下降。在本文中,为了应对这些挑战,我们提出了一种DRL的联合学习(FL)方法,我们称之为联合DRL(F-DRL),其中基站(BS)通过仅共享模型权重而非训练数据来协作训练嵌入式DNN。我们评估了基于价值和基于策略的两个不同版本的F-DRL,并展示了它们与分布式和集中式DRL相比所实现的优越性能。 摘要:Next Generation (NextG) networks are expected to support demanding tactile internet applications such as augmented reality and connected autonomous vehicles. Whereas recent innovations bring the promise of larger link capacity, their sensitivity to the environment and erratic performance defy traditional model-based control rationales. Zero-touch data-driven approaches can improve the ability of the network to adapt to the current operating conditions. Tools such as reinforcement learning (RL) algorithms can build optimal control policy solely based on a history of observations. Specifically, deep RL (DRL), which uses a deep neural network (DNN) as a predictor, has been shown to achieve good performance even in complex environments and with high dimensional inputs. However, the training of DRL models require a large amount of data, which may limit its adaptability to ever-evolving statistics of the underlying environment. Moreover, wireless networks are inherently distributed systems, where centralized DRL approaches would require excessive data exchange, while fully distributed approaches may result in slower convergence rates and performance degradation. In this paper, to address these challenges, we propose a federated learning (FL) approach to DRL, which we refer to federated DRL (F-DRL), where base stations (BS) collaboratively train the embedded DNN by only sharing models' weights rather than training data. We evaluate two distinct versions of F-DRL, value and policy based, and show the superior performance they achieve compared to distributed and centralized DRL.

【4】 First-Order Regret in Reinforcement Learning with Linear Function Approximation: A Robust Estimation Approach 标题:线性函数逼近强化学习中的一阶遗憾:一种稳健估计方法 链接:https://arxiv.org/abs/2112.03432

作者:Andrew Wagenmaker,Yifang Chen,Max Simchowitz,Simon S. Du,Kevin Jamieson 摘要:获得一阶后悔边界——后悔边界不是最坏情况,而是在给定实例上衡量最优策略的性能——是顺序决策中的一个核心问题。虽然这种界限存在于许多环境中,但在具有大状态空间的强化学习中却被证明是难以捉摸的。在这项工作中,我们解决了这一差距,并表明在具有大状态空间的强化学习(即线性MDP设置)中,有可能获得$mathcal{O}(sqrt{V_1^star K})$的遗憾缩放。这里$V_1^star$是最佳策略的值,$K$是剧集数。我们证明了现有的基于最小二乘估计的技术不足以获得这一结果,相反,我们开发了一种新的基于鲁棒Catoni均值估计的鲁棒自归一化浓度界,它可能具有独立的意义。 摘要:Obtaining first-order regret bounds -- regret bounds scaling not as the worst-case but with some measure of the performance of the optimal policy on a given instance -- is a core question in sequential decision-making. While such bounds exist in many settings, they have proven elusive in reinforcement learning with large state spaces. In this work we address this gap, and show that it is possible to obtain regret scaling as $mathcal{O}(sqrt{V_1^star K})$ in reinforcement learning with large state spaces, namely the linear MDP setting. Here $V_1^star$ is the value of the optimal policy and $K$ is the number of episodes. We demonstrate that existing techniques based on least squares estimation are insufficient to obtain this result, and instead develop a novel robust self-normalized concentration bound based on the robust Catoni mean estimator, which may be of independent interest.

元学习(1篇)

【1】 Noether Networks: Meta-Learning Useful Conserved Quantities 标题:Noether网络:元学习有用的守恒量 链接:https://arxiv.org/abs/2112.03321

作者:Ferran Alet,Dylan Doblar,Allan Zhou,Joshua Tenenbaum,Kenji Kawaguchi,Chelsea Finn 机构:MIT,Stanford University,National University of Singapore 备注:Accepted to NeurIPS '21. The first two authors contributed equally 摘要:机器学习(ML)的进步源于数据可用性、计算资源和归纳偏差的适当编码的结合。有用的偏差通常利用预测问题中的对称性,例如依赖于平移等变的卷积网络。自动发现这些有用的对称性有可能极大地提高ML系统的性能,但仍然是一个挑战。在这项工作中,我们专注于序列预测问题,并从Noether定理中得到启发,以减少寻找归纳偏差到元学习有用守恒量的问题。我们提出了Noether网络:一种新型的结构,其中元学习守恒损失在预测函数内得到优化。我们从理论和实验上证明,Noether网络提高了预测质量,为发现序列问题中的归纳偏差提供了一个通用框架。 摘要:Progress in machine learning (ML) stems from a combination of data availability, computational resources, and an appropriate encoding of inductive biases. Useful biases often exploit symmetries in the prediction problem, such as convolutional networks relying on translation equivariance. Automatically discovering these useful symmetries holds the potential to greatly improve the performance of ML systems, but still remains a challenge. In this work, we focus on sequential prediction problems and take inspiration from Noether's theorem to reduce the problem of finding inductive biases to meta-learning useful conserved quantities. We propose Noether Networks: a new type of architecture where a meta-learned conservation loss is optimized inside the prediction function. We show, theoretically and experimentally, that Noether Networks improve prediction quality, providing a general framework for discovering inductive biases in sequential problems.

符号|符号学习(1篇)

【1】 Neuro-Symbolic Inductive Logic Programming with Logical Neural Networks 标题:基于逻辑神经网络的神经符号归纳逻辑程序设计 链接:https://arxiv.org/abs/2112.03324

作者:Prithviraj Sen,Breno W. S. R. de Carvalho,Ryan Riegel,Alexander Gray 机构:IBM Research 摘要:最近关于神经符号归纳逻辑编程的研究已经产生了一些有希望的方法,可以从嘈杂的真实数据中学习解释性规则。虽然一些建议使用模糊逻辑或实值逻辑中的可微算子来近似逻辑算子,这些算子是无参数的,因此降低了它们拟合数据的能力,但其他方法只是松散地基于逻辑,因此很难解释所学的“规则”。在本文中,我们提出了学习规则和最近提出的逻辑神经网络(LNN)。与其他方法相比,LNN提供了与经典布尔逻辑的强大连接,从而允许精确解释学习的规则,同时包含可通过基于梯度的优化进行训练的参数,以有效拟合数据。我们将LNNs扩展为一阶逻辑中的规则。我们在标准基准测试任务上的实验证实,LNN规则具有高度的可解释性,并且由于其灵活的参数化,可以达到相当或更高的精度。 摘要:Recent work on neuro-symbolic inductive logic programming has led to promising approaches that can learn explanatory rules from noisy, real-world data. While some proposals approximate logical operators with differentiable operators from fuzzy or real-valued logic that are parameter-free thus diminishing their capacity to fit the data, other approaches are only loosely based on logic making it difficult to interpret the learned "rules". In this paper, we propose learning rules with the recently proposed logical neural networks (LNN). Compared to others, LNNs offer strong connection to classical Boolean logic thus allowing for precise interpretation of learned rules while harboring parameters that can be trained with gradient-based optimization to effectively fit the data. We extend LNNs to induce rules in first-order logic. Our experiments on standard benchmarking tasks confirm that LNN rules are highly interpretable and can achieve comparable or higher accuracy due to their flexible parameterization.

分层学习(1篇)

【1】 Combining Learning from Human Feedback and Knowledge Engineering to Solve Hierarchical Tasks in Minecraft 标题:人类反馈学习与知识工程相结合解决“我的世界”中的分层任务 链接:https://arxiv.org/abs/2112.03482

作者:Vinicius G. Goecks,Nicholas Waytowich,David Watkins,Bharat Prakash 机构:Army Research Laboratory, Aberdeen Proving Ground, Maryland, USA, Columbia University, New York City, New York, USA, University of Maryland, Baltimore, Maryland, USA 备注:Submitted to the AAAI 2022 Spring Symposium on Machine Learning and Knowledge Engineering for Hybrid Intelligence (AAAI-MAKE 2022) 摘要:现实世界中感兴趣的任务通常由人类可读的描述来定义,并且没有预定义的奖励信号,除非由人类设计师定义。相反,数据驱动算法通常被设计用于解决特定的、狭义定义的任务,并具有驱动代理学习的性能指标。在这项工作中,我们提出的解决方案,赢得第一名,并被授予最人类样的代理在2021 NANIPs竞争矿物玄武岩挑战:学习人类反馈在MiCeCRAP,这挑战了参与者使用人的数据来解决四个任务只定义了自然语言描述和奖励功能。我们的方法使用可用的人类演示数据来训练用于导航的模仿学习策略,并使用额外的人类反馈来训练图像分类器。这些模块,连同估计的里程图,然后组合成一个状态机,该状态机是根据人类对任务的知识设计的,这些任务按自然层次结构分解,并控制学习代理在任何时刻应遵循的宏观行为。我们将这种混合智能方法与端到端机器学习和纯工程解决方案进行比较,然后由人工评估人员进行判断。代码库可在https://github.com/viniciusguigo/kairos_minerl_basalt. 摘要:Real-world tasks of interest are generally poorly defined by human-readable descriptions and have no pre-defined reward signals unless it is defined by a human designer. Conversely, data-driven algorithms are often designed to solve a specific, narrowly defined, task with performance metrics that drives the agent's learning. In this work, we present the solution that won first place and was awarded the most human-like agent in the 2021 NeurIPS Competition MineRL BASALT Challenge: Learning from Human Feedback in Minecraft, which challenged participants to use human data to solve four tasks defined only by a natural language description and no reward function. Our approach uses the available human demonstration data to train an imitation learning policy for navigation and additional human feedback to train an image classifier. These modules, together with an estimated odometry map, are then combined into a state-machine designed based on human knowledge of the tasks that breaks them down in a natural hierarchy and controls which macro behavior the learning agent should follow at any instant. We compare this hybrid intelligence approach to both end-to-end machine learning and pure engineered solutions, which are then judged by human evaluators. Codebase is available at https://github.com/viniciusguigo/kairos_minerl_basalt.

医学相关(1篇)

【1】 Hard Sample Aware Noise Robust Learning for Histopathology Image Classification 标题:硬样本感知噪声鲁棒学习在组织病理学图像分类中的应用 链接:https://arxiv.org/abs/2112.03694

作者:Chuang Zhu,Wenkai Chen,Ting Peng,Ying Wang,Mulan Jin 机构: Peng are with the School of Arti-ficial Intelligence, Beijing University of Posts and Telecommunica-tions 备注:14 pages, 20figures, IEEE Transactions on Medical Imaging 摘要:基于深度学习的组织病理学图像分类是帮助医生提高癌症诊断准确性和及时性的关键技术。然而,在复杂的人工标注过程中,标签噪声往往是不可避免的,从而误导了分类模型的训练。在这项工作中,我们介绍了一种新的硬样本感知噪声鲁棒学习方法用于组织病理学图像分类。为了区分信息性硬样本和有害噪声样本,我们利用样本训练历史建立了易/硬/噪声(EHN)检测模型。然后,我们将EHN集成到一个自训练结构中,通过逐步标记校正来降低噪声率。利用获得的几乎干净的数据集,我们进一步提出了一种噪声抑制和硬增强(NSHE)方案来训练噪声鲁棒模型。与以前的工作相比,我们的方法可以节省更多的干净样本,并且可以直接应用于真实的有噪声数据集场景,而不需要使用干净的子集。实验结果表明,无论是在合成数据集还是在真实噪声数据集,该方法都优于目前最新的方法。源代码和数据可在https://github.com/bupt-ai-cz/HSA-NRL/. 摘要:Deep learning-based histopathology image classification is a key technique to help physicians in improving the accuracy and promptness of cancer diagnosis. However, the noisy labels are often inevitable in the complex manual annotation process, and thus mislead the training of the classification model. In this work, we introduce a novel hard sample aware noise robust learning method for histopathology image classification. To distinguish the informative hard samples from the harmful noisy ones, we build an easy/hard/noisy (EHN) detection model by using the sample training history. Then we integrate the EHN into a self-training architecture to lower the noise rate through gradually label correction. With the obtained almost clean dataset, we further propose a noise suppressing and hard enhancing (NSHE) scheme to train the noise robust model. Compared with the previous works, our method can save more clean samples and can be directly applied to the real-world noisy dataset scenario without using a clean subset. Experimental results demonstrate that the proposed scheme outperforms the current state-of-the-art methods in both the synthetic and real-world noisy datasets. The source code and data are available at https://github.com/bupt-ai-cz/HSA-NRL/.

蒸馏|知识提取(1篇)

【1】 Safe Distillation Box 标题:安全蒸馏箱 链接:https://arxiv.org/abs/2112.03695

作者:Jingwen Ye,Yining Mao,Jie Song,Xinchao Wang,Cheng Jin,Mingli Song 机构: Zhejiang University, Hangzhou, National University of Singapore, Fudan University 备注:Accepted by AAAI2022 摘要:知识提炼(KD)最近成为一种将知识从预先训练过的教师模型转移到轻量级学生的强大策略,并在广泛的应用中取得了前所未有的成功。尽管取得了令人鼓舞的结果,KD过程本身对网络所有权保护构成了潜在威胁,因为网络中包含的知识可以毫不费力地提取出来,从而暴露给恶意用户。在本文中,我们提出了一个新的框架,称为安全蒸馏箱(SDB),它允许我们将预先训练好的模型包装在一个虚拟箱中,以保护知识产权。具体而言,SDB保留了包装模型对所有用户的推理能力,但排除了未经授权用户的KD。另一方面,对于授权用户,SDB执行知识扩充计划,以增强KD性能和学生模型的结果。换句话说,所有用户都可以使用SDB中的模型进行推理,但只有授权用户才能从该模型访问KD。建议的SDB对模型架构不施加任何约束,并且可以很容易地作为即插即用解决方案来保护预先训练的网络的所有权。各种数据集和体系结构的实验表明,使用SDB,未经授权KD的性能显著下降,而授权KD的性能得到增强,这证明了SDB的有效性。 摘要:Knowledge distillation (KD) has recently emerged as a powerful strategy to transfer knowledge from a pre-trained teacher model to a lightweight student, and has demonstrated its unprecedented success over a wide spectrum of applications. In spite of the encouraging results, the KD process per se poses a potential threat to network ownership protection, since the knowledge contained in network can be effortlessly distilled and hence exposed to a malicious user. In this paper, we propose a novel framework, termed as Safe Distillation Box (SDB), that allows us to wrap a pre-trained model in a virtual box for intellectual property protection. Specifically, SDB preserves the inference capability of the wrapped model to all users, but precludes KD from unauthorized users. For authorized users, on the other hand, SDB carries out a knowledge augmentation scheme to strengthen the KD performances and the results of the student model. In other words, all users may employ a model in SDB for inference, but only authorized users get access to KD from the model. The proposed SDB imposes no constraints over the model architecture, and may readily serve as a plug-and-play solution to protect the ownership of a pre-trained network. Experiments across various datasets and architectures demonstrate that, with SDB, the performance of an unauthorized KD drops significantly while that of an authorized gets enhanced, demonstrating the effectiveness of SDB.

推荐(1篇)

【1】 Cross-domain User Preference Learning for Cold-start Recommendation 标题:面向冷启动推荐的跨域用户偏好学习 链接:https://arxiv.org/abs/2112.03667

作者:Huiling Zhou,Jie Liu,Zhikang Li,Jin Yu,Hongxia Yang 机构:DAMO Academy, Alibaba Group, China 摘要:跨域冷启动推荐是推荐系统中一个日益突出的问题。现有的工作主要集中在解决跨域用户推荐和冷启动内容推荐。然而,当一个新域在其早期阶段发展时,它具有与源域相似的潜在用户,但交互要少得多。从源域了解用户的偏好并将其转移到目标域是至关重要的,尤其是在用户反馈有限的新到达的内容上。为了弥补这一差距,我们提出了一个自我训练的跨领域用户偏好学习(COUNP)框架,目标是具有各种语义标记的冷启动推荐,如项目属性或视频类型。更具体地,我们考虑三个级别的偏好,包括用户历史、用户内容和用户组,以提供可靠的推荐。用户历史由领域感知序列模型表示,频率编码器应用于底层标签,用于用户内容偏好学习。然后,提出了一种具有正交节点表示的分层存储树,以进一步推广跨域的用户组偏好。整个框架使用先进先出(FIFO)队列进行对比更新,以获得更独特的表示。在两个数据集上的大量实验证明了耦合在用户和内容冷启动情况下的效率。通过部署一周的在线A/B测试,我们发现这对情侣的点击率(CTR)优于淘宝应用上使用的其他基线。目前该方法已用于在线跨域冷微视频推荐。 摘要:Cross-domain cold-start recommendation is an increasingly emerging issue for recommender systems. Existing works mainly focus on solving either cross-domain user recommendation or cold-start content recommendation. However, when a new domain evolves at its early stage, it has potential users similar to the source domain but with much fewer interactions. It is critical to learn a user's preference from the source domain and transfer it into the target domain, especially on the newly arriving contents with limited user feedback. To bridge this gap, we propose a self-trained Cross-dOmain User Preference LEarning (COUPLE) framework, targeting cold-start recommendation with various semantic tags, such as attributes of items or genres of videos. More specifically, we consider three levels of preferences, including user history, user content and user group to provide reliable recommendation. With user history represented by a domain-aware sequential model, a frequency encoder is applied to the underlying tags for user content preference learning. Then, a hierarchical memory tree with orthogonal node representation is proposed to further generalize user group preference across domains. The whole framework updates in a contrastive way with a First-In-First-Out (FIFO) queue to obtain more distinctive representations. Extensive experiments on two datasets demonstrate the efficiency of COUPLE in both user and content cold-start situations. By deploying an online A/B test for a week, we show that the Click-Through-Rate (CTR) of COUPLE is superior to other baselines used on Taobao APP. Now the method is serving online for the cross-domain cold micro-video recommendation.

聚类(1篇)

【1】 Lattice-Based Methods Surpass Sum-of-Squares in Clustering 标题:基于格的聚类方法在聚类中的超越平方和 链接:https://arxiv.org/abs/2112.03898

作者:Ilias Zadik,Min Jae Song,Alexander S. Wein,Joan Bruna 机构:Department of Mathematics, Massachusetts Institute of Technology, Courant Institute of Mathematical Sciences, New York University, Simons Institute for the Theory of Computing, UC Berkeley, Center for Data Science, New York University 摘要:聚类是无监督学习中的一个基本原理,它产生了一类丰富的具有计算挑战性的推理任务。在这项工作中,我们专注于聚类未知(可能退化)协方差的$d$维高斯混合的规范任务。最近的工作(Ghosh et al.'20;Mao,Wein'21;Davis,Diaz,Wang'21)已经针对低次多项式方法和平方和(SoS)层次建立了下限,用于恢复高斯聚类实例中植入的某些隐藏结构。先前对许多类似推理任务的研究表明,这种下界强烈表明聚类存在固有的统计-计算差距,即聚类任务在统计上是可能的,但没有多项式时间算法成功。我们考虑的聚类任务的一个特殊情况等价于在其他随机子空间中找到一个种植的超立方体向量的问题。我们表明,也许令人惊讶的是,这个特定的聚类模型没有显示出统计到计算的差距,即使前面提到的低度和SoS下限继续适用于这种情况。为了实现这一点,我们给出了一种基于Lenstra--Lenstra--Lovasz格基约简方法的多项式时间算法,该算法实现了$d+1$样本的统计最优样本复杂度。这一结果扩展了一类问题,这些问题的推测统计到计算间隙可以通过“脆性”多项式时间算法“闭合”,突出了噪声在统计到计算间隙开始中的关键但微妙的作用。 摘要:Clustering is a fundamental primitive in unsupervised learning which gives rise to a rich class of computationally-challenging inference tasks. In this work, we focus on the canonical task of clustering $d$-dimensional Gaussian mixtures with unknown (and possibly degenerate) covariance. Recent works (Ghosh et al. '20; Mao, Wein '21; Davis, Diaz, Wang '21) have established lower bounds against the class of low-degree polynomial methods and the sum-of-squares (SoS) hierarchy for recovering certain hidden structures planted in Gaussian clustering instances. Prior work on many similar inference tasks portends that such lower bounds strongly suggest the presence of an inherent statistical-to-computational gap for clustering, that is, a parameter regime where the clustering task is extit{statistically} possible but no extit{polynomial-time} algorithm succeeds. One special case of the clustering task we consider is equivalent to the problem of finding a planted hypercube vector in an otherwise random subspace. We show that, perhaps surprisingly, this particular clustering model extit{does not exhibit} a statistical-to-computational gap, even though the aforementioned low-degree and SoS lower bounds continue to apply in this case. To achieve this, we give a polynomial-time algorithm based on the Lenstra--Lenstra--Lovasz lattice basis reduction method which achieves the statistically-optimal sample complexity of $d+1$ samples. This result extends the class of problems whose conjectured statistical-to-computational gaps can be "closed" by "brittle" polynomial-time algorithms, highlighting the crucial but subtle role of noise in the onset of statistical-to-computational gaps.

自动驾驶|车辆|车道检测等(1篇)

【1】 Causal Analysis and Classification of Traffic Crash Injury Severity Using Machine Learning Algorithms 标题:基于机器学习算法的交通碰撞伤严重程度原因分析与分类 链接:https://arxiv.org/abs/2112.03407

作者:Meghna Chakraborty,Timothy Gates,Subhrajit Sinha 机构:Department of Civil and Environmental Engineering, Michigan State University, South Shaw, Pacific Northwest National Laboratory, Battelle Blvd, Richland, WA 摘要:应用非参数方法对交通事故进行损伤严重程度的因果分析和分类受到了有限的关注。本研究采用不同的机器学习技术,包括决策树(DT)、随机森林(RF)、极端梯度增强(XGBoost)和深度神经网络(DNN),提出了一个因果推断的方法框架,使用格兰杰因果关系分析和州际交通事故伤害严重程度分类。本研究中使用的数据是针对2014年至2019年间德克萨斯州所有州际公路上的交通事故获得的。建议的严重性分类方法的输出包括致命和严重伤害(KA)碰撞、非严重和可能伤害(BC)碰撞以及仅财产损失(PDO)碰撞的三类。格兰杰因果关系有助于确定影响碰撞严重性的最具影响力的因素,而基于学习的模型预测了性能不同的严重性等级。Granger因果关系分析的结果确定,限速、地面和天气条件、交通量、工作区的存在、工作区的工人和高占用率车辆(HOV)车道等是影响碰撞严重性的最重要因素。分类器的预测性能在不同类别中产生不同的结果。具体而言,虽然决策树和随机森林分类器分别为数据中最稀有的KA类的PDO和BC严重性提供了最大的性能,但深度神经网络分类器的性能优于所有其他算法,这很可能是由于其逼近非线性模型的能力。本研究有助于使用非参数方法对交通碰撞损伤严重程度进行因果分析和分类预测,这方面的知识非常有限。 摘要:Causal analysis and classification of injury severity applying non-parametric methods for traffic crashes has received limited attention. This study presents a methodological framework for causal inference, using Granger causality analysis, and injury severity classification of traffic crashes, occurring on interstates, with different machine learning techniques including decision trees (DT), random forest (RF), extreme gradient boosting (XGBoost), and deep neural network (DNN). The data used in this study were obtained for traffic crashes on all interstates across the state of Texas from a period of six years between 2014 and 2019. The output of the proposed severity classification approach includes three classes for fatal and severe injury (KA) crashes, non-severe and possible injury (BC) crashes, and property damage only (PDO) crashes. While Granger Causality helped identify the most influential factors affecting crash severity, the learning-based models predicted the severity classes with varying performance. The results of Granger causality analysis identified the speed limit, surface and weather conditions, traffic volume, presence of workzones, workers in workzones, and high occupancy vehicle (HOV) lanes, among others, as the most important factors affecting crash severity. The prediction performance of the classifiers yielded varying results across the different classes. Specifically, while decision tree and random forest classifiers provided the greatest performance for PDO and BC severities, respectively, for the KA class, the rarest class in the data, deep neural net classifier performed superior than all other algorithms, most likely due to its capability of approximating nonlinear models. This study contributes to the limited body of knowledge pertaining to causal analysis and classification prediction of traffic crash injury severity using non-parametric approaches.

点云|SLAM|雷达|激光|深度RGBD相关(1篇)

【1】 Wild ToFu: Improving Range and Quality of Indirect Time-of-Flight Depth with RGB Fusion in Challenging Environments 标题:野生豆腐:在具有挑战性的环境中通过RGB融合提高间接飞行时间深度的范围和质量 链接:https://arxiv.org/abs/2112.03750

作者:HyunJun Jung,Nikolas Brasch,Ales Leonardis,Nassir Navab,Benjamin Busam 机构: Technical University of Munich, Huawei Noah’s Ark Lab 摘要:间接飞行时间(I-ToF)成像因其体积小、价格合理而成为移动设备深度估计的一种广泛方式。以往的工作主要集中在改善I-ToF成像质量,特别是治疗多径干扰(MPI)的影响。这些调查通常在近距离、室内和微弱环境光下的特定受限场景中进行。令人惊讶的是,在现实生活场景中,由于传感器功率和光散射有限的衰减导致大量诱导散粒噪声和信号稀疏,因此在强环境光和远距离情况下,研究I-ToF质量改善的工作很少。在这项工作中,我们提出了一种新的基于学习的端到端深度预测网络,该网络采用带噪的原始I-ToF信号和RGB图像,并基于多步方法融合其潜在表示,包括隐式和显式对齐,以预测与RGB视点对齐的高质量远程深度图。我们在具有挑战性的真实场景上测试了我们的方法,与基线方法相比,最终深度图上的RMSE提高了40%以上。 摘要:Indirect Time-of-Flight (I-ToF) imaging is a widespread way of depth estimation for mobile devices due to its small size and affordable price. Previous works have mainly focused on quality improvement for I-ToF imaging especially curing the effect of Multi Path Interference (MPI). These investigations are typically done in specifically constrained scenarios at close distance, indoors and under little ambient light. Surprisingly little work has investigated I-ToF quality improvement in real-life scenarios where strong ambient light and far distances pose difficulties due to an extreme amount of induced shot noise and signal sparsity, caused by the attenuation with limited sensor power and light scattering. In this work, we propose a new learning based end-to-end depth prediction network which takes noisy raw I-ToF signals as well as an RGB image and fuses their latent representation based on a multi step approach involving both implicit and explicit alignment to predict a high quality long range depth map aligned to the RGB viewpoint. We test our approach on challenging real-world scenes and show more than 40% RMSE improvement on the final depth map compared to the baseline approach.

联邦学习|隐私保护|加密(1篇)

【1】 Communication and Energy Efficient Slimmable Federated Learning via Superposition Coding and Successive Decoding 标题:基于叠加编码和逐次解码的通信和能量高效的可精简联邦学习 链接:https://arxiv.org/abs/2112.03267

作者:Hankyul Baek,Won Joon Yun,Soyi Jung,Jihong Park,Mingyue Ji,Joongheon Kim,Mehdi Bennis 机构: 1Korea University, 2Deakin University, 3The University of Utah, 4University of Oulu 备注:11 pages, 10 Figures, presented at the International Workshop on Federated Learning for User Privacy and Data Confidentiality in Conjunction with ICML 2021 (FL-ICML'21). arXiv admin note: substantial text overlap with arXiv:2112.02543 摘要:移动设备是大数据不可或缺的来源。联邦学习(FL)通过交换本地训练的模型而不是原始数据,在利用这些私有数据方面具有巨大的潜力。然而,移动设备通常能量有限且无线连接,FL无法灵活应对其异构且随时间变化的能量容量和通信吞吐量,从而限制了采用。基于这些问题,我们提出了一个新的能源和通信效率FL框架,即SlimFL。为了解决能量容量不均匀的问题,SlimFL中的每个设备都运行一个宽度可调的可精简神经网络(SNN)。为了解决异构通信吞吐量问题,每个全宽(1.0x)SNN模型及其半宽($0.5$x)模型在传输前进行叠加编码,并在接收后根据信道质量连续解码为0.5x或$1.0$x模型。仿真结果表明,SlimFL可以同时训练$0.5$x和$1.0$x两个模型,具有合理的精度和收敛速度,而香草FL则可以使用$2$x的通信资源分别训练这两个模型。令人惊讶的是,对于较差的信道和非IID数据分布,SlimFL比香草FL在更低的能量足迹下实现了更高的精度,在此情况下香草FL收敛较慢。 摘要:Mobile devices are indispensable sources of big data. Federated learning (FL) has a great potential in exploiting these private data by exchanging locally trained models instead of their raw data. However, mobile devices are often energy limited and wirelessly connected, and FL cannot cope flexibly with their heterogeneous and time-varying energy capacity and communication throughput, limiting the adoption. Motivated by these issues, we propose a novel energy and communication efficient FL framework, coined SlimFL. To resolve the heterogeneous energy capacity problem, each device in SlimFL runs a width-adjustable slimmable neural network (SNN). To address the heterogeneous communication throughput problem, each full-width (1.0x) SNN model and its half-width ($0.5$x) model are superposition-coded before transmission, and successively decoded after reception as the 0.5x or $1.0$x model depending on the channel quality. Simulation results show that SlimFL can simultaneously train both $0.5$x and $1.0$x models with reasonable accuracy and convergence speed, compared to its vanilla FL counterpart separately training the two models using $2$x more communication resources. Surprisingly, SlimFL achieves even higher accuracy with lower energy footprints than vanilla FL for poor channels and non-IID data distributions, under which vanilla FL converges slowly.

推理|分析|理解|解释(6篇)

【1】 Disentangled Counterfactual Recurrent Networks for Treatment Effect Inference over Time 标题:用于治疗效果随时间推断的解缠反事实递归网络 链接:https://arxiv.org/abs/2112.03811

作者:Jeroen Berrevoets,Alicia Curth,Ioana Bica,Eoin McKinney,Mihaela van der Schaar 机构:University of Cambridge, The Alan Turing Institute, University of Oxford, University of California, Los Angeles (UCLA) 摘要:为每位患者选择最佳治疗计划需要准确预测其随时间变化的治疗结果轨迹。虽然大型观测数据集构成了可供学习的丰富信息来源,但它们也包含偏见,因为在实践中很少随机分配治疗。为了提供准确无偏的预测,我们引入了解纠缠反事实回归网络(DCRN),这是一种新的序列对序列架构,通过学习患者历史的表示来估计随时间的治疗结果,这些患者历史被解纠缠为三个独立的潜在因素:治疗因素,仅影响治疗选择;结果因素,仅影响结果;这是一个混杂因素,对两者都有影响。通过一个完全受治疗影响随时间变化的因果结构启发的架构,我们提高了预测准确性和疾病理解,因为我们的架构允许从业者推断哪些患者特征影响患者轨迹中的哪一部分,与该领域的其他方法相比。我们证明,无论是在真实数据还是模拟数据上,DCRN在预测治疗反应方面都优于当前最先进的方法。 摘要:Choosing the best treatment-plan for each individual patient requires accurate forecasts of their outcome trajectories as a function of the treatment, over time. While large observational data sets constitute rich sources of information to learn from, they also contain biases as treatments are rarely assigned randomly in practice. To provide accurate and unbiased forecasts, we introduce the Disentangled Counterfactual Recurrent Network (DCRN), a novel sequence-to-sequence architecture that estimates treatment outcomes over time by learning representations of patient histories that are disentangled into three separate latent factors: a treatment factor, influencing only treatment selection; an outcome factor, influencing only the outcome; and a confounding factor, influencing both. With an architecture that is completely inspired by the causal structure of treatment influence over time, we advance forecast accuracy and disease understanding, as our architecture allows for practitioners to infer which patient features influence which part in a patient's trajectory, contrasting other approaches in this domain. We demonstrate that DCRN outperforms current state-of-the-art methods in forecasting treatment responses, on both real and simulated data.

【2】 Tell me why! -- Explanations support learning of relational and causal structure 标题:告诉我为什么!--解释有助于学习关系结构和因果结构 链接:https://arxiv.org/abs/2112.03753

作者:Andrew K. Lampinen,Nicholas A. Roy,Ishita Dasgupta,Stephanie C. Y. Chan,Allison C. Tam,James L. McClelland,Chen Yan,Adam Santoro,Neil C. Rabinowitz,Jane X. Wang,Felix Hill 机构:DeepMind, London, UK 备注:22 pages 摘要:解释在人类学习中扮演着相当重要的角色,特别是在人工智能仍然面临重大挑战的领域——形成抽象,学习世界的关系和因果结构。在这里,我们探讨强化学习代理是否同样可以从解释中获益。我们概述了一系列关系任务,这些任务涉及选择一个对象,该对象是集合中的奇数对象(即,沿着许多可能的特征维度中的一个维度是唯一的)。奇数一次性任务要求代理对一组对象之间的多维关系进行推理。我们表明,代理人不能仅从奖励中很好地学习这些任务,但当他们还接受了生成解释对象属性或选择正确或不正确原因的语言的训练时,他们的绩效达到90%以上。在进一步的实验中,我们展示了预测解释如何使代理人能够从模棱两可、因果混淆的训练中适当地概括,甚至元学习执行实验干预以识别因果结构。我们表明,解释有助于克服代理人专注于简单特征的倾向,并探索解释的哪些方面使其最为有益。我们的结果表明,从解释中学习是一个强大的原则,可以为训练更健壮和通用的机器学习系统提供一条有希望的途径。 摘要:Explanations play a considerable role in human learning, especially in areas thatremain major challenges for AI -- forming abstractions, and learning about the re-lational and causal structure of the world. Here, we explore whether reinforcement learning agents might likewise benefit from explanations. We outline a family of relational tasks that involve selecting an object that is the odd one out in a set (i.e., unique along one of many possible feature dimensions). Odd-one-out tasks require agents to reason over multi-dimensional relationships among a set of objects. We show that agents do not learn these tasks well from reward alone, but achieve >90% performance when they are also trained to generate language explaining object properties or why a choice is correct or incorrect. In further experiments, we show how predicting explanations enables agents to generalize appropriately from ambiguous, causally-confounded training, and even to meta-learn to perform experimental interventions to identify causal structure. We show that explanations help overcome the tendency of agents to fixate on simple features, and explore which aspects of explanations make them most beneficial. Our results suggest that learning from explanations is a powerful principle that could offer a promising path towards training more robust and general machine learning systems.

【3】 Scaling Structured Inference with Randomization 标题:基于随机化的伸缩结构化推理 链接:https://arxiv.org/abs/2112.03638

作者:Yao Fu,Mirella Lapata 机构:Traditional dynamic programming based inference for ex-ponential families has limited scalability with large combi- 1Institute for Language 备注:Preprint 摘要:在深度学习时代,离散图形模型的状态空间规模对于模型能力至关重要。现有的基于动态规划(DP)的推理通常适用于少量状态(通常少于数百个)。在这项工作中,我们提出了一系列随机动态规划(RDP)算法,用于将结构化模型扩展到成千上万个潜在状态。我们的方法广泛适用于经典的基于DP的推理(划分、边缘、再参数化、熵等)和不同的图结构(链、树和更一般的超图)。它还与自动微分兼容,因此可以与神经网络无缝集成,并使用基于梯度的优化器进行学习。我们的核心技术是随机化,即限制和重新加权一小部分节点上的DP,从而将计算量减少几个数量级。通过Rao Blackwellization和重要性抽样,我们进一步实现了低偏差和方差。在不同图形上的不同推理实验证明了我们方法的准确性和有效性。此外,当使用RDP训练规模结构VAE时,它在测试可能性方面优于基线,并成功地防止了后塌陷。 摘要:The scale of the state space of discrete graphical models is crucial for model capacity in the era of deep learning. Existing dynamic programming (DP) based inference typically works with a small number of states (usually less than hundreds). In this work, we propose a family of randomized dynamic programming (RDP) algorithms for scaling structured models to tens of thousands of latent states. Our method is widely applicable to classical DP-based inference (partition, marginal, reparameterization, entropy, .etc) and different graph structures (chains, trees, and more general hypergraphs). It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly and learned with gradient-based optimizers. Our core technique is randomization, which is to restrict and reweight DP on a small selected subset of nodes, leading to computation reduction by orders of magnitudes. We further achieve low bias and variance with Rao-Blackwellization and importance sampling. Experiments on different inferences over different graphs demonstrate the accuracy and efficiency of our methods. Furthermore, when using RDP to train a scaled structured VAE, it outperforms baselines in terms of test likelihood and successfully prevents posterior collapse.

【4】 A Novel Convergence Analysis for Algorithms of the Adam Family 标题:亚当家族算法的一种新的收敛性分析 链接:https://arxiv.org/abs/2112.03459

作者:Zhishuai Guo,Yi Xu,Wotao Yin,Rong Jin,Tianbao Yang 机构:†Department of Computer Science, The University of Iowa, Iowa City, IA , USA, ‡Machine Intelligence Technology, Alibaba Group, Bellevue, WA , USA 备注:In NeurIPS OPT Workshop 2021. arXiv admin note: substantial text overlap with arXiv:2104.14840 摘要:自2014年发明以来,Adam优化器一直备受关注。一方面,它在深度学习中得到了广泛的应用,许多变体被提出,另一方面,它们的理论收敛性仍然是一个谜。从某种意义上说,有些研究要求对更新进行强有力的假设,这在实践中并不一定适用,而其他研究仍然遵循Adam最初的有问题的收敛分析,这被证明不足以确保收敛。尽管Adam存在严格的收敛性分析,但它们对自适应步长的更新提出了具体要求,这些要求不够通用,无法涵盖Adam的许多其他变体。为了解决这些问题,在这个扩展的摘要中,我们给出了一系列Adam风格方法(包括Adam、AMSGrad、Adabound等)的简单而通用的收敛性证明。我们的分析只需要一阶矩的增大或较大的“动量”参数,这实际上是在实践中使用的情况,以及步长自适应因子的有界条件,这适用于随机梯度温和条件下Adam的所有变量。我们还建立了所用随机梯度估计的方差递减结果。事实上,我们对Adam的分析非常简单和通用,因此可以利用Adam建立收敛性,以解决更广泛的非凸优化问题,包括最小-最大、组合和双层优化问题。有关此扩展摘要的完整(早期)版本,请参阅arXiv:2104.14840。 摘要:Since its invention in 2014, the Adam optimizer has received tremendous attention. On one hand, it has been widely used in deep learning and many variants have been proposed, while on the other hand their theoretical convergence property remains to be a mystery. It is far from satisfactory in the sense that some studies require strong assumptions about the updates, which are not necessarily applicable in practice, while other studies still follow the original problematic convergence analysis of Adam, which was shown to be not sufficient to ensure convergence. Although rigorous convergence analysis exists for Adam, they impose specific requirements on the update of the adaptive step size, which are not generic enough to cover many other variants of Adam. To address theses issues, in this extended abstract, we present a simple and generic proof of convergence for a family of Adam-style methods (including Adam, AMSGrad, Adabound, etc.). Our analysis only requires an increasing or large "momentum" parameter for the first-order moment, which is indeed the case used in practice, and a boundness condition on the adaptive factor of the step size, which applies to all variants of Adam under mild conditions of stochastic gradients. We also establish a variance diminishing result for the used stochastic gradient estimators. Indeed, our analysis of Adam is so simple and generic that it can be leveraged to establish the convergence for solving a broader family of non-convex optimization problems, including min-max, compositional, and bilevel optimization problems. For the full (earlier) version of this extended abstract, please refer to arXiv:2104.14840.

【5】 Understanding Square Loss in Training Overparametrized Neural Network Classifiers 标题:理解超参数化神经网络分类器训练中的平方损失 链接:https://arxiv.org/abs/2112.03657

作者:Tianyang Hu,Jun Wang,Wenjia Wang,Zhenguo Li 机构:Huawei Noah’s Ark Lab, HKUST 摘要:深度学习在现代分类任务中取得了许多突破。对于不同的数据结构,已经提出了许多体系结构,但对于损失函数,交叉熵损失是主要的选择。最近,一些替代损失使深度分类器的兴趣复活。特别是,经验证据似乎促进了平方损失,但仍然缺乏理论依据。在这项工作中,我们系统地研究了平方损失在神经切线核(NTK)机制下对过参数化神经网络的性能,从而有助于对分类中平方损失的理论理解。揭示了有关泛化误差、鲁棒性和校准误差的有趣特性。我们考虑两种情况,根据类是可分离的还是不分离的。在一般的不可分情况下,对误分类率和校准误差都建立了快速收敛速度。当类是可分离的时,错误分类率会以指数级的速度提高。此外,所得到的裕度被证明是远离零的下限,为鲁棒性提供了理论保证。我们希望我们的发现能够超越NTK制度,并转化为实际情况。为此,我们对实际神经网络进行了广泛的实证研究,证明了平方损失在合成低维数据和真实图像数据中的有效性。与交叉熵相比,平方损失具有可比的泛化误差,但在鲁棒性和模型校准方面具有显著优势。 摘要:Deep learning has achieved many breakthroughs in modern classification tasks. Numerous architectures have been proposed for different data structures but when it comes to the loss function, the cross-entropy loss is the predominant choice. Recently, several alternative losses have seen revived interests for deep classifiers. In particular, empirical evidence seems to promote square loss but a theoretical justification is still lacking. In this work, we contribute to the theoretical understanding of square loss in classification by systematically investigating how it performs for overparametrized neural networks in the neural tangent kernel (NTK) regime. Interesting properties regarding the generalization error, robustness, and calibration error are revealed. We consider two cases, according to whether classes are separable or not. In the general non-separable case, fast convergence rate is established for both misclassification rate and calibration error. When classes are separable, the misclassification rate improves to be exponentially fast. Further, the resulting margin is proven to be lower bounded away from zero, providing theoretical guarantees for robustness. We expect our findings to hold beyond the NTK regime and translate to practical settings. To this end, we conduct extensive empirical studies on practical neural networks, demonstrating the effectiveness of square loss in both synthetic low-dimensional data and real image data. Comparing to cross-entropy, square loss has comparable generalization error but noticeable advantages in robustness and model calibration.

【6】 Grain segmentation in atomistic simulations using orientation-based iterative self-organizing data analysis 标题:基于方向迭代自组织数据分析的原子模拟晶粒分割 链接:https://arxiv.org/abs/2112.03348

作者:M. Vimal,S. Sandfeld,A. Prakash 机构:Micromechanical Materials Modelling (MiMM), Institute of Mechanics and Fluid Dynamcis, TU Bergakademie Freiberg, Lampadiusstraße , Freiberg, Germany, Institute for Advanced Simulation – IAS-,: Materials Data Science and Informatics 备注:40 pages, 8 figures, 9 supplementary figures 摘要:原子模拟现在已经成为在原子尺度上理解材料变形机制的不可或缺的工具。大规模模拟经常用于研究多晶材料在纳米尺度上的行为。在这项工作中,我们提出了一种使用无监督机器学习算法对原子结构进行颗粒分割的方法,该算法根据原子的方向将原子聚类成单个颗粒。所提出的方法称为Orisodata算法,基于迭代自组织数据分析技术,并进行了修改,以在方向空间中工作。该算法在未变形和变形状态下的122晶粒纳米晶薄膜样品上进行了验证。Orisodata算法还与开源可视化工具Ovito中提供的其他两种谷物分割算法进行了比较。结果表明,Orisodata算法能够正确识别变形孪晶以及由低角度晶界分隔的区域。模型参数具有直观的物理意义,并与实验中使用的相似阈值相关,这不仅有助于获得最佳值,而且便于解释和验证结果。 摘要:Atomistic simulations have now established themselves as an indispensable tool in understanding deformation mechanisms of materials at the atomic scale. Large scale simulations are regularly used to study the behavior of polycrystalline materials at the nanoscale. In this work, we propose a method for grain segmentation of an atomistic configuration using an unsupervised machine learning algorithm that clusters atoms into individual grains based on their orientation. The proposed method, called the Orisodata algorithm, is based on the iterative self-organizing data analysis technique and is modified to work in the orientation space. The working of the algorithm is demonstrated on a 122 grain nanocrystalline thin film sample in both undeformed and deformed states. The Orisodata algorithm is also compared with two other grain segmentation algorithms available in the open-source visualization tool Ovito. The results show that the Orisodata algorithm is able to correctly identify deformation twins as well as regions separated by low angle grain boundaries. The model parameters have intuitive physical meaning and relate to similar thresholds used in experiments, which not only helps obtain optimal values but also facilitates easy interpretation and validation of results.

检测相关(6篇)

【1】 In-flight Novelty Detection with Convolutional Neural Networks 标题:基于卷积神经网络的飞行新颖性检测 链接:https://arxiv.org/abs/2112.03765

作者:Adam Hartwell,Felipe Montana,Will Jacobs,Visakan Kadirkamanathan,Andrew R Mills,Tom Clark 机构:Department of Automatic Control and Systems Engineering, University of Sheffield, UK, Rolls-Royce Plc, UK 摘要:燃气轮机发动机是一种复杂的机器,通常会产生大量数据,需要仔细监控,以实现经济高效的预防性维护。在航空航天应用中,将所有测量数据返回地面的成本高得令人望而却步,常常导致有用的、高价值的数据被丢弃。因此,实时检测、排序和返回有用数据的能力至关重要。本文提出,由正态性卷积神经网络模型描述的系统输出测量值应实时优先,以引起预防性维护决策者的注意。由于燃气轮机发动机时变行为的复杂性,很难导出精确的物理模型,并且往往导致模型预测精度低,与实时执行不兼容。数据驱动建模是一种理想的替代方法,可以生成高精度、特定于资产的模型,而无需从第一原理推导。我们提出了一个数据驱动系统,用于在线检测和优先处理异常数据。通过将不确定性管理集成到深度神经预测模型中,避免了由新操作条件产生的有偏数据评估。对真实和合成数据进行测试,显示对真实和合成故障的敏感性。该系统能够在低功耗嵌入式硬件上实时运行,目前正在劳斯莱斯Pearl 15发动机飞行试验中部署。 摘要:Gas turbine engines are complex machines that typically generate a vast amount of data, and require careful monitoring to allow for cost-effective preventative maintenance. In aerospace applications, returning all measured data to ground is prohibitively expensive, often causing useful, high value, data to be discarded. The ability to detect, prioritise, and return useful data in real-time is therefore vital. This paper proposes that system output measurements, described by a convolutional neural network model of normality, are prioritised in real-time for the attention of preventative maintenance decision makers. Due to the complexity of gas turbine engine time-varying behaviours, deriving accurate physical models is difficult, and often leads to models with low prediction accuracy and incompatibility with real-time execution. Data-driven modelling is a desirable alternative producing high accuracy, asset specific models without the need for derivation from first principles. We present a data-driven system for online detection and prioritisation of anomalous data. Biased data assessment deriving from novel operating conditions is avoided by uncertainty management integrated into the deep neural predictive model. Testing is performed on real and synthetic data, showing sensitivity to both real and synthetic faults. The system is capable of running in real-time on low-power embedded hardware and is currently in deployment on the Rolls-Royce Pearl 15 engine flight trials.

【2】 Two-stage Deep Stacked Autoencoder with Shallow Learning for Network Intrusion Detection System 标题:用于网络入侵检测系统的浅学习两级深栈自动编码器 链接:https://arxiv.org/abs/2112.03704

作者:Nasreen Fathima,Akshara Pramod,Yash Srivastava,Anusha Maria Thomas,Syed Ibrahim S P,Chandran K R 机构:a Research Scholar, School of Computer Science and Engineering, Vellore Institute of Technology, Chennai Campus, Tamil Nadu, India, b School of Electronics Engineering, Vellore Institute of Technology, Chennai Campus, Tamil Nadu, India 备注:8 pages, 3 figures 摘要:零星事件,如实时网络流量中的恶意攻击,已导致大型组织收入损失大幅增加。这是由于网络的过度增长及其与过多人群的接触。用于检测入侵的标准方法没有前途,并且在识别新恶意软件方面存在重大失败。此外,在处理大量稀疏数据、高误报率、小班较少的检测率、训练时间和数据维度的特征工程方面的挑战促进了深度学习以较少的时间和巨大的成果接管任务。现有系统在解决实时网络流量问题以及特征工程方面需要改进。我们提出的工作克服了这些挑战,通过在两个阶段中使用深层自动编码器获得了有希望的结果。两阶段深度学习与浅层学习相结合,第二阶段使用随机森林进行分类。这使得该模型与最新的加拿大网络安全研究所-入侵检测系统2017(CICIDS-2017)数据集配合良好。实现了零误报和令人钦佩的检测精度。 摘要:Sparse events, such as malign attacks in real-time network traffic, have caused big organisations an immense hike in revenue loss. This is due to the excessive growth of the network and its exposure to a plethora of people. The standard methods used to detect intrusions are not promising and have significant failure to identify new malware. Moreover, the challenges in handling high volume data with sparsity, high false positives, fewer detection rates in minor class, training time and feature engineering of the dimensionality of data has promoted deep learning to take over the task with less time and great results. The existing system needs improvement in solving real-time network traffic issues along with feature engineering. Our proposed work overcomes these challenges by giving promising results using deep-stacked autoencoders in two stages. The two-stage deep learning combines with shallow learning using the random forest for classification in the second stage. This made the model get well with the latest Canadian Institute for Cybersecurity - Intrusion Detection System 2017 (CICIDS-2017) dataset. Zero false positives with admirable detection accuracy were achieved.

【3】 Neural Networks for Infectious Diseases Detection: Prospects and Challenges 标题:神经网络在传染病检测中的应用前景与挑战 链接:https://arxiv.org/abs/2112.03571

作者:Muhammad Azeem,Shumaila Javaid,Hamza Fahim,Nasir Saeed 备注:Submitted to IEEE/ACM Transactions on Computational Biology and Bioinformatics 摘要:人工神经网络(ANN)具有学习、纠正错误以及将大量原始数据转换为有用的医疗决策以进行治疗和护理的能力,这一能力因提高患者安全性和护理质量而越来越受欢迎。因此,本文综述了人工神经网络在为患者的医疗决策和有效的疾病诊断提供有价值的见解方面的关键作用。我们全面回顾了现有文献中提出的不同类型的人工神经网络,这些神经网络提高了人工神经网络对复杂应用的适应性。此外,2019冠状病毒疾病的诊断和治疗,如病毒、皮肤、癌症和COVID-19,也有神经网络的研究进展。此外,2019冠状病毒疾病的检测精度也提高了,我们提出了一种新的深度卷积神经网络(CNN)模型。ConXNet使用不同的数据集进行训练和测试,其检测准确率和精度达到97%以上,明显优于现有模型。最后,我们强调了未来的研究方向和挑战,如算法的复杂性、可用数据不足、隐私和安全性以及生物传感与人工神经网络的集成。为了提高人工神经网络在医疗诊断和治疗中的应用范围,需要对这些研究方向给予足够的重视。 摘要:Artificial neural network (ANN) ability to learn, correct errors, and transform a large amount of raw data into useful medical decisions for treatment and care have increased its popularity for enhanced patient safety and quality of care. Therefore, this paper reviews the critical role of ANNs in providing valuable insights for patients' healthcare decisions and efficient disease diagnosis. We thoroughly review different types of ANNs presented in the existing literature that advanced ANNs adaptation for complex applications. Moreover, we also investigate ANN's advances for various disease diagnoses and treatments such as viral, skin, cancer, and COVID-19. Furthermore, we propose a novel deep Convolutional Neural Network (CNN) model called ConXNet for improving the detection accuracy of COVID-19 disease. ConXNet is trained and tested using different datasets, and it achieves more than 97% detection accuracy and precision, which is significantly better than existing models. Finally, we highlight future research directions and challenges such as complexity of the algorithms, insufficient available data, privacy and security, and integration of biosensing with ANNs. These research directions require considerable attention for improving the scope of ANNs for medical diagnostic and treatment applications.

【4】 Test-Time Detection of Backdoor Triggers for Poisoned Deep Neural Networks 标题:中毒深度神经网络后门触发器的测试时间检测 链接:https://arxiv.org/abs/2112.03350

作者:Xi Li,Zhen Xiang,David J. Miller,George Kesidis 机构:School of EECS, Pennsylvania State University 摘要:后门(特洛伊木马)攻击是针对深度神经网络(DNN)的新兴威胁。当来自任何源类的测试样本嵌入后门模式时,被攻击的DNN将向攻击者预测所需的目标类;正确分类干净(无攻击)的测试样本。现有的后门防御在检测DNN是否受到攻击以及在“训练后”机制中反向工程后门模式方面已显示出成功:防御者可以访问待检查的DNN和独立收集的小型干净数据集,但无法访问DNN的训练集(可能中毒)。然而,这些防御既不能捕获触发后门映射的罪犯,也不能在测试时减轻后门攻击。在本文中,我们提出了一种针对图像分类后门攻击的“飞行中”防御方法,即1)在测试时检测后门触发器的使用;以及2)推断检测到的触发器示例的源类(源类)。我们的防御有效性在不同的强后门攻击下得到了实验验证。 摘要:Backdoor (Trojan) attacks are emerging threats against deep neural networks (DNN). A DNN being attacked will predict to an attacker-desired target class whenever a test sample from any source class is embedded with a backdoor pattern; while correctly classifying clean (attack-free) test samples. Existing backdoor defenses have shown success in detecting whether a DNN is attacked and in reverse-engineering the backdoor pattern in a "post-training" regime: the defender has access to the DNN to be inspected and a small, clean dataset collected independently, but has no access to the (possibly poisoned) training set of the DNN. However, these defenses neither catch culprits in the act of triggering the backdoor mapping, nor mitigate the backdoor attack at test-time. In this paper, we propose an "in-flight" defense against backdoor attacks on image classification that 1) detects use of a backdoor trigger at test-time; and 2) infers the class of origin (source class) for a detected trigger example. The effectiveness of our defense is demonstrated experimentally against different strong backdoor attacks.

【5】 Smart Metering System Capable of Anomaly Detection by Bi-directional LSTM Autoencoder 标题:双向LSTM自动编码器异常检测智能抄表系统 链接:https://arxiv.org/abs/2112.03275

作者:Sangkeum Lee,Hojun Jin,Sarvar Hussain Nengroo,Yoonmee Doh,Chungho Lee,Taewook Heo,Dongsoo Har 机构:Environment ICT Research, Section, Electronics and, Telecommunications Research, Institute (ETRI), Daejeon , South Korea, The Cho Chun Shik Graduate, School of Green Transportation, Korea Advanced Institute of, Science and Technology (KAIST) 备注:6 pages, 6 figures, accepted by "IEEE 40th International Conference on Consumer Electronics" 摘要:异常检测涉及广泛的应用,如故障检测、系统监控和事件检测。从智能计量系统获得的计量数据中识别异常是提高电力系统可靠性、稳定性和效率的关键任务。本文提出了一种异常检测过程,以发现在智能计量系统中观察到的异常值。在该方法中,使用基于双向长短时记忆(BiLSTM)的自动编码器来发现异常数据点。它利用非异常数据通过自动编码器计算重建误差,并通过预先设定的阈值从非异常数据中分离出待分类为异常的异常值。基于BiLSTM自动编码器的异常检测方法通过从985户家庭收集的4种能源(电/水/供暖/热水)的计量数据进行测试。 摘要:Anomaly detection is concerned with a wide range of applications such as fault detection, system monitoring, and event detection. Identifying anomalies from metering data obtained from smart metering system is a critical task to enhance reliability, stability, and efficiency of the power system. This paper presents an anomaly detection process to find outliers observed in the smart metering system. In the proposed approach, bi-directional long short-term memory (BiLSTM) based autoencoder is used and finds the anomalous data point. It calculates the reconstruction error through autoencoder with the non-anomalous data, and the outliers to be classified as anomalies are separated from the non-anomalous data by predefined threshold. Anomaly detection method based on the BiLSTM autoencoder is tested with the metering data corresponding to 4 types of energy sources electricity/water/heating/hot water collected from 985 households.

【6】 Automation Of Transiting Exoplanet Detection, Identification and Habitability Assessment Using Machine Learning Approaches 标题:利用机器学习方法实现穿越系外行星探测、识别和宜居性评估的自动化 链接:https://arxiv.org/abs/2112.03298

作者:Pawel Pratyush,Akshata Gangrade 机构: Department of Computer Science and Engineering, Maulana Azad National Institute of Technology, Bhopal, India 备注:19 pages, 21 figures 摘要:在人类进化史上,我们正处在一个独特的时间轴上,我们可能能够发现太阳系以外恒星周围类似地球的行星,那里的条件可以支持生命,甚至可以在这些行星上找到生命的证据。近年来,随着美国宇航局、欧空局和其他主要航天机构发射了几颗卫星,我们掌握了大量的数据集,这些数据集可用于训练机器学习模型,从而实现系外行星探测、识别和宜居性确定等艰巨任务的自动化。自动化这些任务可以节省大量的时间,并最大限度地减少人工干预造成的人为错误。为了实现这一目标,我们首先分析开普勒望远镜捕捉到的恒星的光强曲线,以检测显示可能存在行星系统特征的潜在曲线。对于这种检测,在训练传统模型的同时,我们提出了一种叠加GBDT模型,该模型可以同时训练光信号的多个表示形式。随后,我们利用几种最先进的机器学习和集成方法,解决了系外行星识别和宜居性确定的自动化问题。系外行星的识别旨在将假阳性实例与系外行星的实际实例区分开来,而可居住性评估则根据系外行星的可居住特征将其分为不同的星团。此外,我们提出了一个新的指标,称为充分热充分性(ATA)评分,以建立可居住和不可居住实例之间的潜在线性关系。实验结果表明,所提出的叠加GBDT模型在探测凌日系外行星方面优于传统模型。此外,在宜居性分类中纳入ATA分数提高了模型的性能。 摘要:We are at a unique timeline in the history of human evolution where we may be able to discover earth-like planets around stars outside our solar system where conditions can support life or even find evidence of life on those planets. With the launch of several satellites in recent years by NASA, ESA, and other major space agencies, an ample amount of datasets are at our disposal which can be utilized to train machine learning models that can automate the arduous tasks of exoplanet detection, its identification, and habitability determination. Automating these tasks can save a considerable amount of time and minimize human errors due to manual intervention. To achieve this aim, we first analyze the light intensity curves from stars captured by the Kepler telescope to detect the potential curves that exhibit the characteristics of an existence of a possible planetary system. For this detection, along with training conventional models, we propose a stacked GBDT model that can be trained on multiple representations of the light signals simultaneously. Subsequently, we address the automation of exoplanet identification and habitability determination by leveraging several state-of-art machine learning and ensemble approaches. The identification of exoplanets aims to distinguish false positive instances from the actual instances of exoplanets whereas the habitability assessment groups the exoplanet instances into different clusters based on their habitable characteristics. Additionally, we propose a new metric called Adequate Thermal Adequacy (ATA) score to establish a potential linear relationship between habitable and non-habitable instances. Experimental results suggest that the proposed stacked GBDT model outperformed the conventional models in detecting transiting exoplanets. Furthermore, the incorporation of ATA scores in habitability classification enhanced the performance of models.

分类|识别(4篇)

【1】 Shrub Ensembles for Online Classification 标题:用于在线分类的灌木集成 链接:https://arxiv.org/abs/2112.03723

作者:Sebastian Buschjäger,Sibylle Hess,Katharina Morik 机构: Technische Universiteit Eindhoven 备注:9 pages main content, 13 pages appendix, accepted at AAAI-2022 摘要:在线学习算法已经成为机器学习工具箱中普遍存在的工具,并且经常用于资源受限的小型环境中。最成功的在线学习方法是决策树(DT)集成。DT集成在适应数据变化的同时提供了优异的性能,但它们并没有资源效率。增量树学习器不断向树中添加新节点,但从不删除旧节点,这会随着时间的推移增加内存消耗。另一方面,基于梯度的树学习需要计算整个树上的梯度,这对于中等大小的树来说是非常昂贵的。在本文中,我们提出了一种新的存储效率高的在线分类集成,称为灌木集成,用于资源约束系统。我们的算法在小窗口上训练中小型决策树,并使用随机近端梯度下降来学习这些“灌木”的集合权重。我们对我们的算法进行了理论分析,并对我们的方法在在线环境中的行为进行了广泛的讨论。在12个不同数据集上进行的2~959个实验中,我们将我们的方法与8种最先进的方法进行了比较。我们的灌木丛组合即使在可用内存很少的情况下也能保持出色的性能。我们表明,SE在12例中的7例中提供了更好的准确性记忆权衡,同时在统计上比大多数其他方法具有更好的性能。我们的实施可在https://github.com/sbuschjaeger/se-online . 摘要:Online learning algorithms have become a ubiquitous tool in the machine learning toolbox and are frequently used in small, resource-constraint environments. Among the most successful online learning methods are Decision Tree (DT) ensembles. DT ensembles provide excellent performance while adapting to changes in the data, but they are not resource efficient. Incremental tree learners keep adding new nodes to the tree but never remove old ones increasing the memory consumption over time. Gradient-based tree learning, on the other hand, requires the computation of gradients over the entire tree which is costly for even moderately sized trees. In this paper, we propose a novel memory-efficient online classification ensemble called shrub ensembles for resource-constraint systems. Our algorithm trains small to medium-sized decision trees on small windows and uses stochastic proximal gradient descent to learn the ensemble weights of these `shrubs'. We provide a theoretical analysis of our algorithm and include an extensive discussion on the behavior of our approach in the online setting. In a series of 2~959 experiments on 12 different datasets, we compare our method against 8 state-of-the-art methods. Our Shrub Ensembles retain an excellent performance even when only little memory is available. We show that SE offers a better accuracy-memory trade-off in 7 of 12 cases, while having a statistically significant better performance than most other methods. Our implementation is available under https://github.com/sbuschjaeger/se-online .

【2】 Handwritten Mathematical Expression Recognition via Attention Aggregation based Bi-directional Mutual Learning 标题:基于注意力聚合的双向交互学习手写数学表达式识别 链接:https://arxiv.org/abs/2112.03603

作者:Xiaohang Bian,Bo Qin,Xiaozhe Xin,Jianwu Li,Xuefeng Su,Yanfeng Wang 机构: Beijing Key Laboratory of Intelligent Information Technology, School of Computer Science and Technology, Beijing Institute of Technology, China, AI Interaction Department, Tencent, China 备注:None 摘要:手写数学表达式识别的目的是从给定的图像中自动生成乳胶序列。目前,基于注意的编解码模型被广泛应用于这项任务中。它们通常以从左到右(L2R)的方式生成目标序列,而不利用从右到左(R2L)的上下文。本文提出了一种基于注意聚合的双向互学习网络(ABM),该网络由一个共享编码器和两个并行逆解码器(L2R和R2L)组成。这两个译码器通过相互蒸馏来增强,在每个训练步骤中涉及一对一的知识转移,充分利用来自两个反向的互补信息。此外,为了处理不同尺度下的数学符号,提出了一种注意力聚合模块(AAM)来有效地集成多尺度覆盖注意。值得注意的是,在推理阶段,假设模型已经从两个反向学习知识,我们只使用L2R分支进行推理,保持原始参数大小和推理速度。大量实验表明,我们提出的方法在没有数据增强和模型融合的情况下,在CROHME 2014、CROHME 2016和CROHME 2019上的识别准确率分别为56.85%、52.92%和53.96%,大大优于最先进的方法。补充资料中提供了源代码。 摘要:Handwritten mathematical expression recognition aims to automatically generate LaTeX sequences from given images. Currently, attention-based encoder-decoder models are widely used in this task. They typically generate target sequences in a left-to-right (L2R) manner, leaving the right-to-left (R2L) contexts unexploited. In this paper, we propose an Attention aggregation based Bi-directional Mutual learning Network (ABM) which consists of one shared encoder and two parallel inverse decoders (L2R and R2L). The two decoders are enhanced via mutual distillation, which involves one-to-one knowledge transfer at each training step, making full use of the complementary information from two inverse directions. Moreover, in order to deal with mathematical symbols in diverse scales, an Attention Aggregation Module (AAM) is proposed to effectively integrate multi-scale coverage attentions. Notably, in the inference phase, given that the model already learns knowledge from two inverse directions, we only use the L2R branch for inference, keeping the original parameter size and inference speed. Extensive experiments demonstrate that our proposed approach achieves the recognition accuracy of 56.85 % on CROHME 2014, 52.92 % on CROHME 2016, and 53.96 % on CROHME 2019 without data augmentation and model ensembling, substantially outperforming the state-of-the-art methods. The source code is available in the supplementary materials.

【3】 CMA-CLIP: Cross-Modality Attention CLIP for Image-Text Classification 标题:CMA-CLIP:用于图文分类的跨通道注意剪辑 链接:https://arxiv.org/abs/2112.03562

作者:Huidong Liu,Shaoyuan Xu,Jinmiao Fu,Yang Liu,Ning Xie,Chien-chih Wang,Bryan Wang,Yi Sun 机构:Stony Brook University, Stony Brook, NY, USA, Amazon Inc., Seattle, WA, USA 摘要:社交媒体和电子商务等现代网络系统包含以图像和文本表示的丰富内容。利用来自多种模式的信息可以提高机器学习任务(如分类和推荐)的性能。在本文中,我们提出了跨模态注意对比语言图像预训练(CMA-CLIP),这是一种新的框架,它将两种跨模态注意(顺序注意和模态注意)结合起来,以有效地融合图像和文本对中的信息。序列式注意使框架能够捕获图像块和文本标记之间的细粒度关系,而模态式注意则通过其与下游任务的相关性来衡量每个模态。此外,通过添加任务特定的模态注意和多层感知器,我们提出的框架能够执行多模态的多任务分类。我们在一个主要零售网站产品属性(MRWPA)数据集和两个公共数据集Food101和Fashion-Gen上进行了实验。结果表明,CMA-CLIP在多任务分类的MRWPA数据集上,在相同精度水平下,召回率平均比预训练和微调的CLIP高11.9%。它在精度上也比Fashion Gen数据集上的最新方法高出5.5%,并在Food101数据集上实现了具有竞争力的性能。通过详细的消融研究,我们进一步证明了跨模态注意模块的有效性以及我们的方法对图像和文本输入噪声的鲁棒性,这是实践中的一个常见挑战。 摘要:Modern Web systems such as social media and e-commerce contain rich contents expressed in images and text. Leveraging information from multi-modalities can improve the performance of machine learning tasks such as classification and recommendation. In this paper, we propose the Cross-Modality Attention Contrastive Language-Image Pre-training (CMA-CLIP), a new framework which unifies two types of cross-modality attentions, sequence-wise attention and modality-wise attention, to effectively fuse information from image and text pairs. The sequence-wise attention enables the framework to capture the fine-grained relationship between image patches and text tokens, while the modality-wise attention weighs each modality by its relevance to the downstream tasks. In addition, by adding task specific modality-wise attentions and multilayer perceptrons, our proposed framework is capable of performing multi-task classification with multi-modalities. We conduct experiments on a Major Retail Website Product Attribute (MRWPA) dataset and two public datasets, Food101 and Fashion-Gen. The results show that CMA-CLIP outperforms the pre-trained and fine-tuned CLIP by an average of 11.9% in recall at the same level of precision on the MRWPA dataset for multi-task classification. It also surpasses the state-of-the-art method on Fashion-Gen Dataset by 5.5% in accuracy and achieves competitive performance on Food101 Dataset. Through detailed ablation studies, we further demonstrate the effectiveness of both cross-modality attention modules and our method's robustness against noise in image and text inputs, which is a common challenge in practice.

【4】 Extrapolation Frameworks in Cognitive Psychology Suitable for Study of Image Classification Models 标题:适用于图像分类模型研究的认知心理学外推框架 链接:https://arxiv.org/abs/2112.03411

作者:Roozbeh Yousefzadeh,Jessica A. Mollick 机构:Yale Center for Medical Informatics, Yale University, and VA Connecticut Healthcare System, New Haven, CT , Department of Psychiatry, Yale School of Medicine 备注:1st Workshop on Human and Machine Decisions (WHMD 2021) at NeurIPS 2021 摘要:我们研究了深度学习图像分类模型的功能任务,并表明图像分类需要外推能力。这表明,为了理解深度学习,必须开发新的理论,因为当前的理论假设模型只是插值,留下了许多关于它们的问题没有答案。我们研究了通过训练模型从图像中提取的像素空间和特征空间(在其隐藏层中,包括预训练残差神经网络最后一个隐藏层中的64维特征空间),以及通过小波/剪切波提取的特征空间。在所有这些领域中,测试样本都大大超出了训练集的凸包,图像分类需要外推。与深度学习文献相反,在认知科学、心理学和神经科学中,外推和学习通常是同时进行的。此外,据报道,人类视觉认知和行为的许多方面都涉及外推。我们提出了一个新的外推框架,用于深入学习模型的数学研究。在我们的框架中,我们使用术语外推,在训练集的凸包外(在像素空间或特征空间中),但在训练数据定义的特定范围内,以这种特定的方式外推,认知科学的许多研究中定义了相同的外推方式。我们解释说,我们的外推框架可以为深度学习的开放性研究问题提供新的答案,包括其过度参数化、训练机制、分布外检测,等等。我们还发现,在学习任务中,外推的程度可以忽略不计,据报道,深度学习没有简单模型的优势。 摘要:We study the functional task of deep learning image classification models and show that image classification requires extrapolation capabilities. This suggests that new theories have to be developed for the understanding of deep learning as the current theory assumes models are solely interpolating, leaving many questions about them unanswered. We investigate the pixel space and also the feature spaces extracted from images by trained models (in their hidden layers, including the 64-dimensional feature space in the last hidden layer of pre-trained residual neural networks), and also the feature space extracted by wavelets/shearlets. In all these domains, testing samples considerably fall outside the convex hull of training sets, and image classification requires extrapolation. In contrast to the deep learning literature, in cognitive science, psychology, and neuroscience, extrapolation and learning are often studied in tandem. Moreover, many aspects of human visual cognition and behavior are reported to involve extrapolation. We propose a novel extrapolation framework for the mathematical study of deep learning models. In our framework, we use the term extrapolation in this specific way of extrapolating outside the convex hull of training set (in the pixel space or feature space) but within the specific scope defined by the training data, the same way extrapolation is defined in many studies in cognitive science. We explain that our extrapolation framework can provide novel answers to open research problems about deep learning including their over-parameterization, their training regime, out-of-distribution detection, etc. We also see that the extent of extrapolation is negligible in learning tasks where deep learning is reported to have no advantage over simple models.

3D|3D重建等相关(1篇)

【1】 Gaussian map predictions for 3D surface feature localisation and counting 标题:用于三维地物定位和计数的高斯图预测 链接:https://arxiv.org/abs/2112.03736

作者:Justin Le Louëdec,Grzegorz Cielniak 机构:Lincoln Centre for Autonomous, Systems, University of Lincoln, Lincoln LN,TS, United Kingdom 备注:BMVC 2021 摘要:在本文中,我们建议使用高斯地图表示来估计三维表面特征的精确位置和计数,解决了基于密度估计的最新方法在局部干扰存在时的局限性。高斯贴图指示可能的对象位置,可以直接从关键点注释生成,避免了费力且昂贵的每像素注释。我们将此方法应用于3D球体类对象,这些对象可以投影到2D形状表示中,从而通过神经网络GNet(一种改进的UNet体系结构)进行高效处理,该结构生成曲面特征的可能位置及其精确计数。我们展示了这项技术在草莓瘦果计数中的实际应用,它被用作表型应用中的水果质量测量。从一个公开的数据集中对数百个草莓的3D扫描结果表明,该系统的准确性和精确度优于此应用中基于密度的最新方法。 摘要:In this paper, we propose to employ a Gaussian map representation to estimate precise location and count of 3D surface features, addressing the limitations of state-of-the-art methods based on density estimation which struggle in presence of local disturbances. Gaussian maps indicate probable object location and can be generated directly from keypoint annotations avoiding laborious and costly per-pixel annotations. We apply this method to the 3D spheroidal class of objects which can be projected into 2D shape representation enabling efficient processing by a neural network GNet, an improved UNet architecture, which generates the likely locations of surface features and their precise count. We demonstrate a practical use of this technique for counting strawberry achenes which is used as a fruit quality measure in phenotyping applications. The results of training the proposed system on several hundreds of 3D scans of strawberries from a publicly available dataset demonstrate the accuracy and precision of the system which outperforms the state-of-the-art density-based methods for this application.

优化|敛散性(3篇)

【1】 PTR-PPO: Proximal Policy Optimization with Prioritized Trajectory Replay 标题:PTR-PPO:具有优先轨迹重放的近端策略优化 链接:https://arxiv.org/abs/2112.03798

作者:Xingxing Liang,Yang Ma,Yanghe Feng,Zhong Liu 机构:College of Systems Engineering, National University of Defense Technology, Changsha, CN 备注:16 pages,10figures, We plan to submit a paper to the conference of IJCAI-2022 摘要:基于策略的深度强化学习算法的数据利用率较低,需要丰富的策略改进经验。提出了一种基于优先级轨迹重放(PTR-PPO)的近端策略优化算法,该算法结合了on策略和off策略方法,通过对旧策略生成的轨迹重放进行优先级排序来提高采样效率。我们首先根据轨迹的特点设计了三个轨迹优先级:前两个是基于一步经验广义优势估计(GAE)值的最大和平均轨迹优先级,最后一个是基于归一化未贴现累积奖励的奖励轨迹优先级。然后,我们将优先轨迹重放引入到PPO算法中,提出了一种截断重要性权重的方法来克服多步经验下重要性权重过大带来的高方差,并设计了非策略条件下PPO的策略改进损失函数。我们评估了PTR-PPO在一组Atari离散控制任务中的性能,实现了最先进的性能。此外,通过分析训练期间优先级内存中不同位置的优先级变化热图,我们发现内存大小和卷展栏长度会对轨迹优先级的分布产生显著影响,从而影响算法的性能。 摘要:On-policy deep reinforcement learning algorithms have low data utilization and require significant experience for policy improvement. This paper proposes a proximal policy optimization algorithm with prioritized trajectory replay (PTR-PPO) that combines on-policy and off-policy methods to improve sampling efficiency by prioritizing the replay of trajectories generated by old policies. We first design three trajectory priorities based on the characteristics of trajectories: the first two being max and mean trajectory priorities based on one-step empirical generalized advantage estimation (GAE) values and the last being reward trajectory priorities based on normalized undiscounted cumulative reward. Then, we incorporate the prioritized trajectory replay into the PPO algorithm, propose a truncated importance weight method to overcome the high variance caused by large importance weights under multistep experience, and design a policy improvement loss function for PPO under off-policy conditions. We evaluate the performance of PTR-PPO in a set of Atari discrete control tasks, achieving state-of-the-art performance. In addition, by analyzing the heatmap of priority changes at various locations in the priority memory during training, we find that memory size and rollout length can have a significant impact on the distribution of trajectory priorities and, hence, on the performance of the algorithm.

【2】 Convergence Guarantees for Deep Epsilon Greedy Policy Learning 标题:深度Epsilon贪婪策略学习的收敛性保证 链接:https://arxiv.org/abs/2112.03376

作者:Michael Rawson,Radu Balan 机构:Department of Mathematics, University of Maryland at College Park, Maryland, USA 摘要:政策学习是一个快速发展的领域。由于机器人和计算机控制着日常生活,因此需要最小化和控制它们的错误率。有许多策略学习方法和可证明的错误率。我们展示了一个错误或遗憾界和收敛的深Epsilon贪婪方法选择行动与神经网络的预测。在实际数据集MNIST的实验中,我们构造了一个非线性强化学习问题。我们见证了在高噪声或低噪声条件下,有些方法收敛,有些不收敛,这与我们的收敛性证明是一致的。 摘要:Policy learning is a quickly growing area. As robotics and computers control day-to-day life, their error rate needs to be minimized and controlled. There are many policy learning methods and provable error rates that accompany them. We show an error or regret bound and convergence of the Deep Epsilon Greedy method which chooses actions with a neural network's prediction. In experiments with the real-world dataset MNIST, we construct a nonlinear reinforcement learning problem. We witness how with either high or low noise, some methods do and some do not converge which agrees with our proof of convergence.

【3】 Efficient Calibration of Multi-Agent Market Simulators from Time Series with Bayesian Optimization 标题:基于贝叶斯优化的时间序列多智能体市场模拟器的有效校准 链接:https://arxiv.org/abs/2112.03874

作者:Yuanlu Bai,Henry Lam,Svitlana Vyetrenko,Tucker Balch 机构:Columbia University, USA, J.P.Morgan AI Research, USA 摘要:多代理市场模拟通常用于为下游机器学习或强化学习任务创建环境,例如在将交易策略部署到实时交易之前对其进行训练或测试。在电子交易市场中,通常只能直接观察到由多个市场参与者相互作用产生的价格或交易量时间序列。因此,需要校准多智能体市场环境,以便模拟智能体交互产生的时间序列类似于历史-这相当于解决一个高度复杂的大规模优化问题。在本文中,我们提出了一个简单而有效的框架,用于根据历史时间序列观测值校准多智能体市场模拟器参数。首先,我们考虑一个新的资格设置的概念绕过潜在的不可识别性问题。其次,我们推广了带有Bonferroni校正的两样本Kolmogorov-Smirnov(K-S)检验来检验两个高维时间序列分布之间的相似性,这给出了一个简单但有效的时间序列样本集之间的距离度量。第三,我们建议使用贝叶斯优化(BO)和信赖域BO(TuRBO)来最小化上述距离度量。最后,我们通过数值实验证明了该框架的有效性。 摘要:Multi-agent market simulation is commonly used to create an environment for downstream machine learning or reinforcement learning tasks, such as training or testing trading strategies before deploying them to real-time trading. In electronic trading markets only the price or volume time series, that result from interaction of multiple market participants, are typically directly observable. Therefore, multi-agent market environments need to be calibrated so that the time series that result from interaction of simulated agents resemble historical -- which amounts to solving a highly complex large-scale optimization problem. In this paper, we propose a simple and efficient framework for calibrating multi-agent market simulator parameters from historical time series observations. First, we consider a novel concept of eligibility set to bypass the potential non-identifiability issue. Second, we generalize the two-sample Kolmogorov-Smirnov (K-S) test with Bonferroni correction to test the similarity between two high-dimensional time series distributions, which gives a simple yet effective distance metric between the time series sample sets. Third, we suggest using Bayesian optimization (BO) and trust-region BO (TuRBO) to minimize the aforementioned distance metric. Finally, we demonstrate the efficiency of our framework using numerical experiments.

预测|估计(9篇)

【1】 CapsProm: A Capsule Network For Promoter Prediction 标题:CapsProm:一种启动子预测的胶囊网络 链接:https://arxiv.org/abs/2112.03710

作者:Lauro Moraes,Pedro Silva,Eduardo Luz,Gladston Moreira 机构:Computing Department, Federal University of Ouro Preto, Campus Morro do Cruzeiro, Ouro Preto-MG, Brazil 摘要:在生物信息学领域,在DNA序列中定位启动子区域是至关重要的。这是一个在文献中广泛研究的问题,但尚未完全解决。一些研究人员利用卷积网络取得了显著的成果,这使得能够从DNA链中自动提取特征。然而,一种可以推广到几种生物体的通用架构尚未实现,因此,需要研究人员为每种评估的新生物体寻找新架构和超参数。在这项工作中,我们提出了一种基于胶囊网络的多功能体系结构,它可以准确地识别来自七种不同生物(真核生物和原核生物)的原始DNA数据中的启动子序列。我们的模型,CapsProm,可以帮助生物体之间的学习转移,并扩大其适用性。此外,CapsProm显示出竞争性结果,在七个测试数据集中有五个(F1分数)超过了基线方法。模型和源代码可在https://github.com/lauromoraes/CapsNet-promoter. 摘要:Locating the promoter region in DNA sequences is of paramount importance in the field of bioinformatics. This is a problem widely studied in the literature, however, not yet fully resolved. Some researchers have presented remarkable results using convolution networks, that allowed the automatic extraction of features from a DNA chain. However, a universal architecture that could generalize to several organisms has not yet been achieved, and thus, requiring researchers to seek new architectures and hyperparameters for each new organism evaluated. In this work, we propose a versatile architecture, based on capsule network, that can accurately identify promoter sequences in raw DNA data from seven different organisms, eukaryotic, and prokaryotic. Our model, the CapsProm, could assist in the transfer of learning between organisms and expand its applicability. Furthermore the CapsProm showed competitive results, overcoming the baseline method in five out of seven of the tested datasets (F1-score). The models and source code are made available at https://github.com/lauromoraes/CapsNet-promoter.

【2】 State-of-the-art predictive and prescriptive analytics for IEEE CIS 3rd Technical Challenge 标题:IEEE CIS第三次技术挑战的最新预测和规范分析 链接:https://arxiv.org/abs/2112.03595

作者:Mahdi Abolghasemi,Rasul Esmaeilbeigi 摘要:在本文中,我们描述了我们提出的方法,以接近IEEE CIS第三次技术挑战中引入的预测+优化挑战。预测模型采用LightGBM模型集成,规定性分析采用数学优化有效规定解决方案,使多个场景的平均成本最小化。我们的解决方案在优化方面排名第一,在预测挑战方面排名第二。 摘要:In this paper, we describe our proposed methodology to approach the predict+optimise challenge introduced in the IEEE CIS 3rd Technical Challenge. The predictive model employs an ensemble of LightGBM models and the prescriptive analysis employs mathematical optimisation to efficiently prescribe solutions that minimise the average cost over multiple scenarios. Our solutions ranked 1st in the optimisation and 2nd in the prediction challenge of the competition.

【3】 Predicting the Travel Distance of Patients to Access Healthcare using Deep Neural Networks 标题:基于深度神经网络的患者就医行程预测 链接:https://arxiv.org/abs/2112.03541

作者:Li-Chin Chen,Ji-Tian Sheu,Yuh-Jue Chuang,Yu Tsao 机构:Innovation, Academia Sinica, Academia Road, Section , Nankang, shown that the burden of travel makes treatment an, unaffordable option [,]. Furthermore, [,] showed that patient, health outcomes (for example, survival rates, length of 备注:accepted by IEEE Journal of Translational Engineering in Health and Medicine 摘要:目标:在卫生政策设计期间,改善地域准入仍然是决定区域医疗资源充足性的关键问题。然而,患者的选择可能是各种因素复杂互动的结果。本研究的目的是提出一种深度神经网络方法,以模拟患者选择获得护理的旅行距离的复杂决策,这是资源分配决策的一个重要指标。方法:我们使用台湾4年全国保险数据,并累积早期文献中讨论的可能特征。本研究建议使用基于卷积神经网络(CNN)的框架进行预测。模型性能通过其他机器学习方法进行了测试。使用综合梯度(IG)分析特征权重,进一步解释了所提出的框架。结果:我们成功地证明了使用基于CNN的框架预测患者旅行距离的有效性,准确度为0.968,AUC为0.969,敏感性为0.960,特异性为0.989。基于CNN的框架优于所有其他方法。在本研究中,免疫球蛋白的权重是可以解释的;然而,这种关系并不符合公共卫生领域的已知指标,类似于共识。结论:我们的研究结果证明了基于深度学习的出行距离预测模型的可行性。它有可能指导资源配置方面的决策。 摘要:Objective: Improving geographical access remains a key issue in determining the sufficiency of regional medical resources during health policy design. However, patient choices can be the result of the complex interactivity of various factors. The aim of this study is to propose a deep neural network approach to model the complex decision of patient choice in travel distance to access care, which is an important indicator for policymaking in allocating resources. Method: We used the 4-year nationwide insurance data of Taiwan and accumulated the possible features discussed in earlier literature. This study proposes the use of a convolutional neural network (CNN)-based framework to make predictions. The model performance was tested against other machine learning methods. The proposed framework was further interpreted using Integrated Gradients (IG) to analyze the feature weights. Results: We successfully demonstrated the effectiveness of using a CNN-based framework to predict the travel distance of patients, achieving an accuracy of 0.968, AUC of 0.969, sensitivity of 0.960, and specificity of 0.989. The CNN-based framework outperformed all other methods. In this research, the IG weights are potentially explainable; however, the relationship does not correspond to known indicators in public health, similar to common consensus. Conclusions: Our results demonstrate the feasibility of the deep learning-based travel distance prediction model. It has the potential to guide policymaking in resource allocation.

【4】 Enhanced Exploration in Neural Feature Selection for Deep Click-Through Rate Prediction Models via Ensemble of Gating Layers 标题:基于门层集成的深度点击率预测模型神经特征选择的改进探索 链接:https://arxiv.org/abs/2112.03487

作者:Lin Guan,Xia Xiao,Ming Chen,Youlong Cheng 机构: School of Computing & AI, Arizona State University, ByteDance 摘要:特征选择是开发工业级点击率(CTR)预测系统的关键步骤。神经特征选择(NFS)的目标是选择具有最佳解释力的相对较小的特征子集,作为去除冗余特征和降低计算成本的手段。受基于梯度的神经体系结构搜索(NAS)和网络修剪方法的启发,人们通过门控方法解决了NFS问题,门控方法插入一组可微二进制门,以删除信息量较小的特征。二进制门与网络参数一起以有效的端到端方式进行优化。在本文中,我们从勘探开发的角度分析了基于梯度的解决方案,并使用实证结果表明,选通方法可能会受到勘探不足的影响。为了提高基于梯度的解的探索能力,我们提出了一种简单而有效的集成学习方法,称为集成门。我们选择两个公共数据集,即Avazu和Criteo,来评估这种方法。我们的实验表明,在不增加任何计算开销或引入任何超参数(集合大小除外)的情况下,我们的方法能够持续改进选通方法,并在具有三种不同底层深度CTR预测模型的两个数据集上找到更好的特征子集。 摘要:Feature selection has been an essential step in developing industry-scale deep Click-Through Rate (CTR) prediction systems. The goal of neural feature selection (NFS) is to choose a relatively small subset of features with the best explanatory power as a means to remove redundant features and reduce computational cost. Inspired by gradient-based neural architecture search (NAS) and network pruning methods, people have tackled the NFS problem with Gating approach that inserts a set of differentiable binary gates to drop less informative features. The binary gates are optimized along with the network parameters in an efficient end-to-end manner. In this paper, we analyze the gradient-based solution from an exploration-exploitation perspective and use empirical results to show that Gating approach might suffer from insufficient exploration. To improve the exploration capacity of gradient-based solutions, we propose a simple but effective ensemble learning approach, named Ensemble Gating. We choose two public datasets, namely Avazu and Criteo, to evaluate this approach. Our experiments show that, without adding any computational overhead or introducing any hyper-parameter (except the size of the ensemble), our method is able to consistently improve Gating approach and find a better subset of features on the two datasets with three different underlying deep CTR prediction models.

【5】 A Unified Framework for Multi-distribution Density Ratio Estimation 标题:一种多分布密度比估计的统一框架 链接:https://arxiv.org/abs/2112.03440

作者:Lantao Yu,Yujia Jin,Stefano Ermon 机构:Department of Computer Science, Stanford University, Department of Management Science and Engineering 摘要:二进制密度比估计(DRE),估计的比率PY1/PY2$给定其经验样本的问题,为许多国家的最先进的机器学习算法,如对比表示学习和协变量移位适应提供了基础。在这项工作中,我们考虑一个广义的设置,其中给定的样本从多个分布PY1, LDOTS,PYK $($ K>2美元),我们的目的是有效地估计所有的分布对之间的密度比。这种推广带来了重要的新应用,如估计多个随机变量之间的统计差异,如多分布$f$-散度,以及通过多重要性抽样进行偏差校正。然后,我们从Bregman散度最小化的角度发展了一个通用框架,其中每个严格凸多元函数都会导致多分布DRE的适当损失。此外,我们重新推导了多分布密度比估计和类概率估计之间的理论联系,证明了在多分布DRE中使用任何严格合适的带连接函数的评分规则组合的合理性。我们表明,我们的框架导致了严格概括二进制DRE中对应方法的方法,以及在各种下游任务中表现出类似或优异性能的新方法。 摘要:Binary density ratio estimation (DRE), the problem of estimating the ratio $p_1/p_2$ given their empirical samples, provides the foundation for many state-of-the-art machine learning algorithms such as contrastive representation learning and covariate shift adaptation. In this work, we consider a generalized setting where given samples from multiple distributions $p_1, ldots, p_k$ (for $k > 2$), we aim to efficiently estimate the density ratios between all pairs of distributions. Such a generalization leads to important new applications such as estimating statistical discrepancy among multiple random variables like multi-distribution $f$-divergence, and bias correction via multiple importance sampling. We then develop a general framework from the perspective of Bregman divergence minimization, where each strictly convex multivariate function induces a proper loss for multi-distribution DRE. Moreover, we rederive the theoretical connection between multi-distribution density ratio estimation and class probability estimation, justifying the use of any strictly proper scoring rule composite with a link function for multi-distribution DRE. We show that our framework leads to methods that strictly generalize their counterparts in binary DRE, as well as new methods that show comparable or superior performance on various downstream tasks.

【6】 Differentiable Generalised Predictive Coding 标题:可微广义预测编码 链接:https://arxiv.org/abs/2112.03378

作者:André Ofner,Sebastian Stober 机构:Otto-von-Guericke University, Magdeburg, Germany 摘要:本文讨论了与神经科学中的神经过程理论一致的可微动力学模型,该模型将大脑功能视为分层过滤,旨在改进解释观察结果的内部生成模型。我们的工作扩展了精确梯度预测编码的现有实现,并允许与深度神经网络集成以实现非线性潜在状态参数化。与梯度下降结合误差反向传播相比,这种基于梯度的预测编码通过优化从数据向潜在状态传播的精度加权预测误差,在每一层局部优化神经网络。预测从潜在状态向较低层反向流动。这里提出的模型,GPC,使用精确的梯度来学习低潜态的层次和动力学预测。分层预测对感知内容及其结构进行编码。动态预测处理编码内容的变化。因此,层次和动态预测解决了相同潜在状态的不同方面。由于潜在状态的变化受其所代表的内容的影响,反之亦然,因此这两种途径相互作用,并允许跨时空尺度甚至向后编码内容动态依赖性的表示。我们将GPC应用于具有自适应采样率的序列数据上的各种感知任务。我们讨论了放宽线性分层模型布局假设,支持任意图结构的可能性。最后,我们勾勒出在嵌套的时空层次结构中有效感知和规划的想法,并讨论与大脑中马尔可夫毯子的联系。 摘要:This paper deals with differentiable dynamical models congruent with neural process theories in neuroscience that cast brain function as hierarchical filtering aiming at the refinement of an internal generative model explaining observations. Our work extends existing implementations of predictive coding with exact gradients and allows integration with deep neural networks for non-linear latent state parameterization. In contrast to Gradient Descent in combination with error backpropagation, such gradient based predictive coding optimises neural networks locally in each layer by optimising precision-weighted prediction errors that propagate from data towards latent states. Predictions flow backwards, from latent states towards lower layers. The model suggested here, GPC, uses exact gradients to learn hierarchical and dynamical predictions of lower latent states. Hierarchical predictions encode the perceived content and its structure. Dynamical predictions address changes in the encoded content. As a result, hierarchical and dynamical predictions address different aspects of the same latent states. Since changes in latent states are influenced by the content they represent and vice versa, both pathways interact and allow to encode representations of content-dynamics dependencies across spatio-temporal scales and even backwards in time. We apply GPC to various perception tasks on sequential data with adaptive sampling rates. We discuss possibilities to relax the assumption of linearly hierarchical model layout in favour of arbitrary graph structure. Finally, we sketch out ideas for efficient perception and planning in nested spatio-temporal hierarchies and discuss the connection to Markov Blankets in the brain.

【7】 Modeling and Predicting Blood Flow Characteristics through Double Stenosed Artery from CFD simulation using Deep Learning Models 标题:基于深度学习模型的CFD模拟双狭窄动脉血流特性建模与预测 链接:https://arxiv.org/abs/2112.03698

作者:Ishat Raihan Jamil,Mayeesha Humaira 机构:Department of Mechanical Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh, Department of Computer Science and Engineering, Ahsanullah University of Science and Technology 摘要:为双狭窄动脉模型的计算流体动力学(CFD)建立患者特定的有限元分析(FEA)模型需要时间和精力,限制了医生在时间紧迫的医疗应用中快速响应的能力。这些问题可以通过训练深度学习(DL)模型来解决,以使用CFD模拟不同配置的简化双狭窄动脉模型生成的数据集来学习和预测血流特征。当通过IVUS成像得出的实际双狭窄动脉模型比较血流模式时,发现在以前的研究工作中广泛使用的狭窄颈部几何形状的正弦近似值不能有效地表示真实收缩的影响。因此,提出了一种新的缩颈几何表示方法,该方法在广义简化模型方面优于前一种假设。动脉管腔直径和流量参数沿血管长度的顺序变化为使用LSTM和GRU-DL模型提供了机会。然而,对于短长度双收缩动脉的小数据集,基本神经网络模型在大多数流动特性方面优于专门的RNN。另一方面,LSTM可以更好地预测波动较大的流动特性,如血管长度上的血压变化。尽管GRU模型在数据集中的所有血管属性的训练和测试中具有良好的整体精度,但在所有情况下,GRU模型在单个血管流量预测方面的表现都不佳。结果还表明,任何模型中的每个属性都需要单独优化的超参数,而不是旨在通过一组超参数在所有输出中实现整体良好性能。 摘要:Establishing patient-specific finite element analysis (FEA) models for computational fluid dynamics (CFD) of double stenosed artery models involves time and effort, restricting physicians' ability to respond quickly in time-critical medical applications. Such issues might be addressed by training deep learning (DL) models to learn and predict blood flow characteristics using a dataset generated by CFD simulations of simplified double stenosed artery models with different configurations. When blood flow patterns are compared through an actual double stenosed artery model, derived from IVUS imaging, it is revealed that the sinusoidal approximation of stenosed neck geometry, which has been widely used in previous research works, fails to effectively represent the effects of a real constriction. As a result, a novel geometric representation of the constricted neck is proposed which, in terms of a generalized simplified model, outperforms the former assumption. The sequential change in artery lumen diameter and flow parameters along the length of the vessel presented opportunities for the use of LSTM and GRU DL models. However, with the small dataset of short lengths of doubly constricted blood arteries, the basic neural network model outperforms the specialized RNNs for most flow properties. LSTM, on the other hand, performs better for predicting flow properties with large fluctuations, such as varying blood pressure over the length of the vessels. Despite having good overall accuracies in training and testing across all the properties for the vessels in the dataset, the GRU model underperforms for an individual vessel flow prediction in all cases. The results also point to the need of individually optimized hyperparameters for each property in any model rather than aiming to achieve overall good performance across all outputs with a single set of hyperparameters.

【8】 A generalization gap estimation for overparameterized models via Langevin functional variance 标题:基于朗之万函数方差的过参数模型泛化缺口估计 链接:https://arxiv.org/abs/2112.03660

作者:Akifumi Okuno,Keisuke Yano 机构:The Institute of Statistical Mathematics, RIKEN Center for Advanced Intelligence Project 备注:21 pages, no figure 摘要:本文讨论过参数化模型(如神经网络)的泛化差距估计,即泛化差距和经验误差之间的差异。我们首先表明,函数方差是定义广泛适用的信息标准的一个关键概念,即使在常规理论无法应用的过参数化环境中,它也表征了泛化差距。接下来,我们提出了一种计算效率高的函数方差近似,即函数方差的朗之万近似(Langevin-FV)。该方法利用平方损失函数的一阶梯度,而不是二阶梯度;因此,它可以高效地计算并与基于梯度的优化算法一致地实现。我们在数值上证明了Langevin FV在估计过参数线性回归和非线性神经网络模型的泛化差距方面的作用。 摘要:This paper discusses estimating the generalization gap, a difference between a generalization gap and an empirical error, for overparameterized models (e.g., neural networks). We first show that a functional variance, a key concept in defining a widely-applicable information criterion, characterizes the generalization gap even in overparameterized settings, where a conventional theory cannot be applied. We next propose a computationally efficient approximation of the function variance, a Langevin approximation of the functional variance~(Langevin FV). This method leverages the 1st-order but not the 2nd-order gradient of the squared loss function; so, it can be computed efficiently and implemented consistently with gradient-based optimization algorithms. We demonstrate the Langevin FV numerically in estimating generalization gaps of overparameterized linear regression and non-linear neural network models.

【9】 Private Robust Estimation by Stabilizing Convex Relaxations 标题:稳定凸松弛的私人稳健估计 链接:https://arxiv.org/abs/2112.03548

作者:Pravesh K. Kothari,Pasin Manurangsi,Ameya Velingker 摘要:我们给出了第一个多项式时间和样本$(epsilon,delta)$-差分私有(DP)算法,用于在存在常数部分敌对异常值的情况下估计平均值、协方差和高阶矩。我们的算法成功地应用于满足鲁棒估计两个已被充分研究的性质的分布族:方向矩的可证明次高斯性和二次多项式的可证明超压缩性。我们的恢复保证适用于“右仿射不变范数”:平均值的马氏距离、乘法谱和相对Frobenius距离的协方差保证和高阶矩的内射范数。以前的工作获得了具有有界协方差的次高斯分布均值估计的私有鲁棒算法。对于协方差估计,我们的算法是第一个在没有任何条件数假设的情况下成功的高效算法(即使在没有异常值的情况下)。我们的算法产生于一个新的框架,该框架提供了一个通用的蓝图,用于修改鲁棒估计的凸松弛,以在适当的参数范数下满足强大的最坏情况稳定性保证,只要算法在运行中产生正确性的见证。我们验证了标准平方和(SoS)半定规划松弛的鲁棒估计修改的这种保证。我们的隐私保证是通过将稳定性保证与一种新的“估计相关”噪声注入机制相结合而获得的,在这种机制中,噪声以估计协方差的特征值进行缩放。我们相信,该框架将更普遍地用于获得稳健估计的DP对应项。独立于我们的工作,Ashtiani和Liaw[AL21]还获得了高斯分布的多项式时间和样本私有鲁棒估计算法。 摘要:We give the first polynomial time and sample $(epsilon, delta)$-differentially private (DP) algorithm to estimate the mean, covariance and higher moments in the presence of a constant fraction of adversarial outliers. Our algorithm succeeds for families of distributions that satisfy two well-studied properties in prior works on robust estimation: certifiable subgaussianity of directional moments and certifiable hypercontractivity of degree 2 polynomials. Our recovery guarantees hold in the "right affine-invariant norms": Mahalanobis distance for mean, multiplicative spectral and relative Frobenius distance guarantees for covariance and injective norms for higher moments. Prior works obtained private robust algorithms for mean estimation of subgaussian distributions with bounded covariance. For covariance estimation, ours is the first efficient algorithm (even in the absence of outliers) that succeeds without any condition-number assumptions. Our algorithms arise from a new framework that provides a general blueprint for modifying convex relaxations for robust estimation to satisfy strong worst-case stability guarantees in the appropriate parameter norms whenever the algorithms produce witnesses of correctness in their run. We verify such guarantees for a modification of standard sum-of-squares (SoS) semidefinite programming relaxations for robust estimation. Our privacy guarantees are obtained by combining stability guarantees with a new "estimate dependent" noise injection mechanism in which noise scales with the eigenvalues of the estimated covariance. We believe this framework will be useful more generally in obtaining DP counterparts of robust estimators. Independently of our work, Ashtiani and Liaw [AL21] also obtained a polynomial time and sample private robust estimation algorithm for Gaussian distributions.

其他神经网络|深度学习|模型|建模(23篇)

【1】 Variance-Aware Weight Initialization for Point Convolutional Neural Networks 标题:点卷积神经网络的方差感知权值初始化 链接:https://arxiv.org/abs/2112.03777

作者:Pedro Hermosilla,Michael Schelling,Tobias Ritschel,Timo Ropinski 机构:Ulm University, University College London 摘要:适当的权值初始化对于成功训练神经网络至关重要。最近,通过基于批次统计数据对每个层进行简单的规范化,批次规范化减少了权重初始化的作用。不幸的是,批量规范化在应用于小批量时有几个缺点,因为在点云上学习时需要它们来应对内存限制。虽然有充分依据的权重初始化策略可以使批量标准化变得不必要,从而避免这些缺点,但对于点卷积网络还没有提出这样的方法。为了填补这一空白,我们提出了一个框架来统一大量的连续卷积。这使我们的主要贡献,方差感知权重初始化。我们表明,这种初始化可以避免批处理规范化,同时获得类似的性能,在某些情况下,还可以获得更好的性能。 摘要:Appropriate weight initialization has been of key importance to successfully train neural networks. Recently, batch normalization has diminished the role of weight initialization by simply normalizing each layer based on batch statistics. Unfortunately, batch normalization has several drawbacks when applied to small batch sizes, as they are required to cope with memory limitations when learning on point clouds. While well-founded weight initialization strategies can render batch normalization unnecessary and thus avoid these drawbacks, no such approaches have been proposed for point convolutional networks. To fill this gap, we propose a framework to unify the multitude of continuous convolutions. This enables our main contribution, variance-aware weight initialization. We show that this initialization can avoid batch normalization while achieving similar and, in some cases, better performance.

【2】 On the Effectiveness of Mode Exploration in Bayesian Model Averaging for Neural Networks 标题:神经网络贝叶斯模型平均中模式探索的有效性 链接:https://arxiv.org/abs/2112.03773

作者:John T. Holodnak,Allan B. Wollaber 机构: 1MassachusettsInstituteofTechnology 备注:Presented at the ICML 2021 Workshop on Uncertainty and Robustness in Deep Learning 摘要:在监督学习环境中使用深度神经网络产生校准预测概率的多种技术已经出现,这些技术利用各种方法来集成在循环训练或从多个随机起始点(深度集成)进行训练期间发现的各种解。然而,只有有限的工作调查了探索每个不同解决方案(后验模式)周围局部区域的效用。在CIFAR-10数据集上使用三种著名的深度体系结构,我们评估了几种探索权重空间局部区域的简单方法,包括Brier分数、精度和预期校准误差。我们考虑贝叶斯推理技术(变分推理和Hamiltonian Monte Carlo应用到SOFTMAX输出层),以及利用随机梯度下降轨迹接近Opima。虽然向集成中添加单独的模式可以均匀地提高性能,但我们表明,与没有模式探索的集成相比,这里考虑的简单模式探索方法几乎没有改进。 摘要:Multiple techniques for producing calibrated predictive probabilities using deep neural networks in supervised learning settings have emerged that leverage approaches to ensemble diverse solutions discovered during cyclic training or training from multiple random starting points (deep ensembles). However, only a limited amount of work has investigated the utility of exploring the local region around each diverse solution (posterior mode). Using three well-known deep architectures on the CIFAR-10 dataset, we evaluate several simple methods for exploring local regions of the weight space with respect to Brier score, accuracy, and expected calibration error. We consider both Bayesian inference techniques (variational inference and Hamiltonian Monte Carlo applied to the softmax output layer) as well as utilizing the stochastic gradient descent trajectory near optima. While adding separate modes to the ensemble uniformly improves performance, we show that the simple mode exploration methods considered here produce little to no improvement over ensembles without mode exploration.

【3】 Towards Modeling and Resolving Singular Parameter Spaces using Stratifolds 标题:基于层叠的奇异参数空间建模与求解 链接:https://arxiv.org/abs/2112.03734

作者:Pascal Mattia Esser,Frank Nielsen 备注:A preliminary version of this work was presented at NeurIPS 2021 as a Spotlight in the 13th Annual Workshop on Optimization for Machine Learning (OPT2021) 摘要:在分析参数统计模型时,一种有用的方法是对参数空间进行几何建模。然而,即使对于非常简单和常用的分层模型,如统计混合或随机深层神经网络,在参数空间中显示非光滑邻域的奇点处,流形的光滑性假设也被违反。这些奇异模型已经在学习动力学的背景下进行了分析,其中奇异点可以作为学习轨迹上的吸引子,从而对模型的收敛速度产生负面影响。我们提出了一种通过使用层叠(代数拓扑学中的一个概念)对奇异参数空间进行形式化建模来避免奇异性问题的通用方法。我们利用特定层理具有分辨率方法的性质构造奇异空间的光滑流形近似。我们的经验表明,在光滑流形近似上使用(自然)梯度下降代替奇异空间可以避免吸引子行为,从而提高学习的收敛速度。 摘要:When analyzing parametric statistical models, a useful approach consists in modeling geometrically the parameter space. However, even for very simple and commonly used hierarchical models like statistical mixtures or stochastic deep neural networks, the smoothness assumption of manifolds is violated at singular points which exhibit non-smooth neighborhoods in the parameter space. These singular models have been analyzed in the context of learning dynamics, where singularities can act as attractors on the learning trajectory and, therefore, negatively influence the convergence speed of models. We propose a general approach to circumvent the problem arising from singularities by using stratifolds, a concept from algebraic topology, to formally model singular parameter spaces. We use the property that specific stratifolds are equipped with a resolution method to construct a smooth manifold approximation of the singular space. We empirically show that using (natural) gradient descent on the smooth manifold approximation instead of the singular space allows us to avoid the attractor behavior and therefore improve the convergence speed in learning.

【4】 Flexible Networks for Learning Physical Dynamics of Deformable Objects 标题:用于学习可变形物体物理动力学的柔性网络 链接:https://arxiv.org/abs/2112.03728

作者:Jinhyung Park,DoHae Lee,In-Kwon Lee 机构: Yonsei University 摘要:使用基于粒子的表示学习可变形物体的物理动力学一直是机器学习中许多计算模型的目标。虽然一些最先进的模型在模拟环境中实现了这一目标,但大多数现有模型都有一个先决条件,即输入是有序点集的序列,即每个点集中的点在整个输入序列中的顺序必须相同。这限制了模型推广到现实世界的数据,这被认为是一个无序点集序列。在本文中,我们提出了一个称为时间点网(TP-Net)的模型,该模型通过直接使用一系列无序点集来推断基于粒子表示的可变形对象的未来状态,从而解决了这个问题。我们的模型由一个共享特征提取器和一个预测网络组成,共享特征提取器并行地从每个输入点集中提取全局特征,预测网络对这些特征进行聚合和推理,以便将来进行预测。我们方法的关键概念是,我们使用全局特征而不是局部特征来实现对输入置换的不变性,并确保模型的稳定性和可伸缩性。实验表明,我们的模型在合成数据集和真实数据集上都达到了最先进的性能,具有实时预测速度。我们提供定量和定性分析,说明为什么我们的方法比现有方法更有效。 摘要:Learning the physical dynamics of deformable objects with particle-based representation has been the objective of many computational models in machine learning. While several state-of-the-art models have achieved this objective in simulated environments, most existing models impose a precondition, such that the input is a sequence of ordered point sets - i.e., the order of the points in each point set must be the same across the entire input sequence. This restrains the model to generalize to real-world data, which is considered to be a sequence of unordered point sets. In this paper, we propose a model named time-wise PointNet (TP-Net) that solves this problem by directly consuming a sequence of unordered point sets to infer the future state of a deformable object with particle-based representation. Our model consists of a shared feature extractor that extracts global features from each input point set in parallel and a prediction network that aggregates and reasons on these features for future prediction. The key concept of our approach is that we use global features rather than local features to achieve invariance to input permutations and ensure the stability and scalability of our model. Experiments demonstrate that our model achieves state-of-the-art performance in both synthetic dataset and in real-world dataset, with real-time prediction speed. We provide quantitative and qualitative analysis on why our approach is more effective and efficient than existing approaches.

【5】 Low-rank Tensor Decomposition for Compression of Convolutional Neural Networks Using Funnel Regularization 标题:基于漏斗正则化的卷积神经网络压缩的低秩张量分解 链接:https://arxiv.org/abs/2112.03690

作者:Bo-Shiuan Chu,Che-Rung Lee 摘要:张量分解能够揭示复杂结构之间的潜在关系,是深卷积神经网络模型压缩的基本技术之一。然而,现有的大多数方法都是对网络进行分层压缩,不能提供一个令人满意的解决方案来实现全局优化。本文提出了一种利用卷积层的低秩张量分解来压缩预训练网络的模型降阶方法。我们的方法基于优化技术来选择合适的分解网络层的秩。提出了一种新的正则化方法,称为漏斗函数,以抑制压缩过程中的不重要因素,从而更容易显示适当的秩。实验结果表明,与其他张量压缩方法相比,该算法可以减少更多的模型参数。对于使用ImageNet2012的ResNet18,我们的简化模型可以达到GMAC速度的两倍以上,而精度下降仅为0.7%,在这两个指标上都优于大多数现有方法。 摘要:Tensor decomposition is one of the fundamental technique for model compression of deep convolution neural networks owing to its ability to reveal the latent relations among complex structures. However, most existing methods compress the networks layer by layer, which cannot provide a satisfactory solution to achieve global optimization. In this paper, we proposed a model reduction method to compress the pre-trained networks using low-rank tensor decomposition of the convolution layers. Our method is based on the optimization techniques to select the proper ranks of decomposed network layers. A new regularization method, called funnel function, is proposed to suppress the unimportant factors during the compression, so the proper ranks can be revealed much easier. The experimental results show that our algorithm can reduce more model parameters than other tensor compression methods. For ResNet18 with ImageNet2012, our reduced model can reach more than twi times speed up in terms of GMAC with merely 0.7% Top-1 accuracy drop, which outperforms most existing methods in both metrics.

【6】 Does Proprietary Software Still Offer Protection of Intellectual Property in the Age of Machine Learning? -- A Case Study using Dual Energy CT Data 标题:在机器学习时代,专有软件还能提供知识产权保护吗?--基于双能量CT数据的案例研究 链接:https://arxiv.org/abs/2112.03678

作者:Andreas Maier,Seung Hee Yang,Farhad Maleki,Nikesh Muthukrishnan,Reza Forghani 机构:Pattern Recognition Lab, FAU Erlangen-N¨urnberg, Department Artificial Intelligence in Medical Engineering, FAU Erlangen-N¨urnberg, McGill University Hospital, McGill University 备注:6 pages, 2 figures, 1 table, accepted on BVM 2022 摘要:在医学图像处理领域,医疗器械制造商在许多情况下通过仅装运编译软件(即二进制代码)来保护其知识产权,该二进制代码可以执行,但潜在攻击者难以理解。在本文中,我们将研究这个过程如何能够很好地保护图像处理算法。特别是,我们研究了从双能CT数据计算单能图像和碘图是否可以通过机器学习方法进行反向工程。我们的结果表明,在所有调查的情况下,仅使用一张单层图像作为训练数据,两者都可以以非常高的精度进行近似,结构相似性大于0.98。 摘要:In the domain of medical image processing, medical device manufacturers protect their intellectual property in many cases by shipping only compiled software, i.e. binary code which can be executed but is difficult to be understood by a potential attacker. In this paper, we investigate how well this procedure is able to protect image processing algorithms. In particular, we investigate whether the computation of mono-energetic images and iodine maps from dual energy CT data can be reverse-engineered by machine learning methods. Our results indicate that both can be approximated using only one single slice image as training data at a very high accuracy with structural similarity greater than 0.98 in all investigated cases.

【7】 Predict and Optimize: Through the Lens of Learning to Rank 标题:预测与优化:通过学习排名的镜头 链接:https://arxiv.org/abs/2112.03609

作者:Jayanta Mandi,Víctor Bucarey,Maxime Mulamba,Tias Guns 机构: Data Analytics Laboratory, Vrije Universiteit Brussel, Belgium, Institute of Engineering Sciences, Universidad de O’Higgins, Rancagua, Chile, Department of Computer Science, KU Leuven, Belgium 备注:Working paper 摘要:在过去的几年中,预测和优化方法(Elmachtoub和Grigas 2021;Wilder,Dilkina,TAMBE 2019)受到了越来越多的关注。这些问题的设置是将预测机器学习(ML)模型的预测反馈给下游优化问题进行决策。预测和优化方法提出通过直接优化优化求解器所做决策的质量来训练ML模型,通常是神经网络模型。然而,预测和优化方法的一个主要瓶颈是解决每个时期每个训练实例的优化问题。为了应对这一挑战,Mulamba等人(2021年)通过缓存可行的解决方案提出了噪声对比估计。在这项工作中,我们证明了噪声对比估计可以看作是学习对解决方案缓存进行排序的一种情况。我们还开发了成对和列表排序损失函数,这些损失函数可以以封闭形式区分,而无需解决优化问题。通过对这些替代损失函数的训练,我们的经验表明,我们能够最小化预测的遗憾。 摘要:In the last years predict-and-optimize approaches (Elmachtoub and Grigas 2021; Wilder, Dilkina, and Tambe 2019) have received increasing attention. These problems have the settings where the predictions of predictive machine learning (ML) models are fed to downstream optimization problems for decision making. Predict-and-optimize approaches propose to train the ML models, often neural network models, by directly optimizing the quality of decisions made by the optimization solvers. However, one major bottleneck of predict-and-optimize approaches is solving the optimization problem for each training instance at every epoch. To address this challenge, Mulamba et al. (2021) propose noise contrastive estimation by caching feasible solutions. In this work, we show the noise contrastive estimation can be considered a case of learning to rank the solution cache. We also develop pairwise and listwise ranking loss functions, which can be differentiated in closed form without the need of solving the optimization problem. By training with respect to these surrogate loss function, we empirically show that we are able to minimize the regret of the predictions.

【8】 A deep language model to predict metabolic network equilibria 标题:预测代谢网络平衡的深层语言模型 链接:https://arxiv.org/abs/2112.03588

作者:François Charton,Amaury Hayat,Sean T. McQuade,Nathaniel J. Merrill,Benedetto Piccoli 机构: Facebook AI Research, CERMICS, Ecole des Ponts ParisTech, Champs-sur-Marne, France, Department of Mathematical Sciences and Center for Computational and Integrative, Biology, Rutgers University–Camden, Cooper St, Camden, NJ, USA 摘要:我们表明,深度学习模型,特别是像Transformer这样最初用于自然语言的架构,可以在随机生成的数据集上进行训练,以非常高的精度预测代谢网络的定性和定量特征。使用标准的数学技术,我们创建了大量(4000万个元素)的随机网络,可以用来训练我们的模型。这些经过训练的模型可以在99%以上的情况下预测随机图上的网络平衡。它们还可以推广到与训练中遇到的图结构不同的图。最后,他们可以几乎完美地预测一小部分已知生物网络的平衡。我们的方法在实验数据上非常经济,并且只使用小的、浅的、深的学习模型,与机器翻译中常用的大型体系结构相去甚远。这些结果为在定量系统药理学、系统生物学和合成生物学等关键领域更广泛地使用与生物网络相关的问题的深度学习模型铺平了道路。 摘要:We show that deep learning models, and especially architectures like the Transformer, originally intended for natural language, can be trained on randomly generated datasets to predict to very high accuracy both the qualitative and quantitative features of metabolic networks. Using standard mathematical techniques, we create large sets (40 million elements) of random networks that can be used to train our models. These trained models can predict network equilibrium on random graphs in more than 99% of cases. They can also generalize to graphs with different structure than those encountered at training. Finally, they can predict almost perfectly the equilibria of a small set of known biological networks. Our approach is both very economical in experimental data and uses only small and shallow deep-learning model, far from the large architectures commonly used in machine translation. Such results pave the way for larger use of deep learning models for problems related to biological networks in key areas such as quantitative systems pharmacology, systems biology, and synthetic biology.

【9】 Defending against Model Stealing via Verifying Embedded External Features 标题:通过验证嵌入的外部特征来防御模型窃取 链接:https://arxiv.org/abs/2112.03476

作者:Yiming Li,Linghui Zhu,Xiaojun Jia,Yong Jiang,Shu-Tao Xia,Xiaochun Cao 机构:Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, China, Peng Cheng Laboratory, Shenzhen, China, Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China 备注:This work is accepted by the AAAI 2022. The first two authors contributed equally to this work. 11 pages 摘要:获得训练有素的模型需要昂贵的数据收集和训练程序,因此该模型是一项宝贵的知识产权。最近的研究表明,对手可以“窃取”部署的模型,即使他们没有训练样本,也无法获得模型参数或结构。目前,有一些防御方法可以缓解这种威胁,主要是通过增加模型窃取的成本。在本文中,我们通过验证可疑模型是否包含defender specifiedemph{external features}的知识,从另一个角度探讨了防御。具体来说,我们通过使用样式转换对一些训练样本进行回火来嵌入外部特征。然后,我们训练一个元分类器来确定模型是否从受害者那里被盗。这种方法的灵感来自于这样一种理解,即被盗模型应该包含受害者模型学习到的特征知识。我们在CIFAR-10和ImageNet数据集上检查了我们的方法。实验结果表明,我们的方法可以有效地同时检测不同类型的模型窃取,即使窃取的模型是通过多阶段窃取过程获得的。再现主要结果的代码可在Github上获得(https://github.com/zlh-thu/StealingVerification). 摘要:Obtaining a well-trained model involves expensive data collection and training procedures, therefore the model is a valuable intellectual property. Recent studies revealed that adversaries can `steal' deployed models even when they have no training samples and can not get access to the model parameters or structures. Currently, there were some defense methods to alleviate this threat, mostly by increasing the cost of model stealing. In this paper, we explore the defense from another angle by verifying whether a suspicious model contains the knowledge of defender-specified emph{external features}. Specifically, we embed the external features by tempering a few training samples with style transfer. We then train a meta-classifier to determine whether a model is stolen from the victim. This approach is inspired by the understanding that the stolen models should contain the knowledge of features learned by the victim model. We examine our method on both CIFAR-10 and ImageNet datasets. Experimental results demonstrate that our method is effective in detecting different types of model stealing simultaneously, even if the stolen model is obtained via a multi-stage stealing process. The codes for reproducing main results are available at Github (https://github.com/zlh-thu/StealingVerification).

【10】 Spectral Complexity-scaled Generalization Bound of Complex-valued Neural Networks 标题:复值神经网络的谱复杂度泛化界 链接:https://arxiv.org/abs/2112.03467

作者:Haowen Chen,Fengxiang He,Shiye Lei,Dacheng Tao 机构: University of Hong Kong 摘要:复值神经网络(CVNNs)在信号处理和图像识别等领域有着广泛的应用。然而,很少有工作关注于CVNN的泛化,尽管这对于确保CVNN在未知数据上的性能至关重要。本文首次证明了复值神经网络的推广界。有界标度具有谱复杂度,其主导因子是权矩阵的谱范数积。此外,当训练数据是连续的时,我们的工作为CVNN提供了一个泛化界,这也受谱复杂度的影响。理论上,这些边界是通过Maurey稀疏引理和Dudley熵积分推导出来的。根据经验,我们通过在不同的数据集上训练复值卷积神经网络来进行实验:MNIST、FashionMNIST、CIFAR-10、CIFAR-100、Tiny ImageNet和IMDB。Spearman的秩序相关系数和这些数据集上相应的p值有力地证明了网络的谱复杂度(通过权重矩阵谱范数乘积测量)与泛化能力具有统计显著相关性。 摘要:Complex-valued neural networks (CVNNs) have been widely applied to various fields, especially signal processing and image recognition. However, few works focus on the generalization of CVNNs, albeit it is vital to ensure the performance of CVNNs on unseen data. This paper is the first work that proves a generalization bound for the complex-valued neural network. The bound scales with the spectral complexity, the dominant factor of which is the spectral norm product of weight matrices. Further, our work provides a generalization bound for CVNNs when training data is sequential, which is also affected by the spectral complexity. Theoretically, these bounds are derived via Maurey Sparsification Lemma and Dudley Entropy Integral. Empirically, we conduct experiments by training complex-valued convolutional neural networks on different datasets: MNIST, FashionMNIST, CIFAR-10, CIFAR-100, Tiny ImageNet, and IMDB. Spearman's rank-order correlation coefficients and the corresponding p values on these datasets give strong proof that the spectral complexity of the network, measured by the weight matrices spectral norm product, has a statistically significant correlation with the generalization ability.

【11】 Equal Bits: Enforcing Equally Distributed Binary Network Weights 标题:相等比特:实施均匀分布的二进制网络权重 链接:https://arxiv.org/abs/2112.03406

作者:Yunqiang Li,Silvia L. Pintea,Jan C. van Gemert 机构:Computer Vision Lab, Delft University of Technology, Delft, Netherlands 摘要:二进制网络非常有效,因为它们只使用两个符号来定义网络:${+1,-1}$。人们可以将这些符号的优先分配作为一种设计选择。秦等人最近的IR网络认为,对二进制权重施加具有相等优先级(相等比特率)的伯努利分布会导致最大熵,从而使信息损失最小化。然而,之前的工作无法精确控制训练过程中的二元权重分布,因此无法保证最大熵。这里,我们展示了使用最优传输的量化可以保证任何比特率,包括相等的比特率。我们通过实验研究了等比特率确实更可取,并表明我们的方法带来了优化效益。我们表明,与最先进的二值化方法相比,我们的量化方法是有效的,即使在使用二值权重修剪时也是如此。 摘要:Binary networks are extremely efficient as they use only two symbols to define the network: ${+1,-1}$. One can make the prior distribution of these symbols a design choice. The recent IR-Net of Qin et al. argues that imposing a Bernoulli distribution with equal priors (equal bit ratios) over the binary weights leads to maximum entropy and thus minimizes information loss. However, prior work cannot precisely control the binary weight distribution during training, and therefore cannot guarantee maximum entropy. Here, we show that quantizing using optimal transport can guarantee any bit ratio, including equal ratios. We investigate experimentally that equal bit ratios are indeed preferable and show that our method leads to optimization benefits. We show that our quantization method is effective when compared to state-of-the-art binarization methods, even when using binary weight pruning.

【12】 A Novel Deep Parallel Time-series Relation Network for Fault Diagnosis 标题:一种用于故障诊断的新型深度并行时序关系网络 链接:https://arxiv.org/abs/2112.03405

作者:Chun Yang 机构:School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, Sichuan, China 摘要:考虑到应用时间序列数据上下文信息的模型可以提高故障诊断性能,提出了一些神经网络结构,如RNN、LSTM和GRU,以有效地对工业过程进行建模。然而,这些模型受到串行计算的限制,因此无法实现高诊断效率。并行CNN也很难有效地实现故障诊断,因为它需要更大的卷积核或深层结构来实现长期的特征提取能力。另外,BERT模型采用绝对位置嵌入的方法将上下文信息引入模型中,会给原始数据带来噪声,不能直接应用于故障诊断。为了解决上述问题,本文提出了一种深并行时序关系网络故障诊断模型。DPTRN主要有三个优点:(1)我们提出的时间关系单元基于全多层感知器( extit{MLP})结构,因此,DPTRN以并行方式执行故障诊断,显著提高了计算效率。(2) 通过对绝对位置嵌入的改进,我们的新型解耦位置嵌入单元可以直接应用于故障诊断,并学习上下文信息。(3) 我们提出的DPTRN在特征可解释性方面具有明显的优势。我们的模型在TE和KDD-CUP99数据集上都优于其他方法,这证实了所提出的DPTRN模型的有效性、效率和可解释性。 摘要:Considering the models that apply the contextual information of time-series data could improve the fault diagnosis performance, some neural network structures such as RNN, LSTM, and GRU were proposed to model the industrial process effectively. However, these models are restricted by their serial computation and hence cannot achieve high diagnostic efficiency. Also the parallel CNN is difficult to implement fault diagnosis in an efficient way because it requires larger convolution kernels or deep structure to achieve long-term feature extraction capabilities. Besides, BERT model applies absolute position embedding to introduce contextual information to the model, which would bring noise to the raw data and therefore cannot be applied to fault diagnosis directly. In order to address the above problems, a fault diagnosis model named deep parallel time-series relation network( extit{DPTRN}) has been proposed in this paper. There are mainly three advantages for DPTRN: (1) Our proposed time relationship unit is based on full multilayer perceptron( extit{MLP}) structure, therefore, DPTRN performs fault diagnosis in a parallel way and improves computing efficiency significantly. (2) By improving the absolute position embedding, our novel decoupling position embedding unit could be applied on the fault diagnosis directly and learn contextual information. (3) Our proposed DPTRN has obvious advantage in feature interpretability. Our model outperforms other methods on both TE and KDD-CUP99 datasets which confirms the effectiveness, efficiency and interpretability of the proposed DPTRN model.

【13】 Efficient Continuous Manifold Learning for Time Series Modeling 标题:用于时间序列建模的高效连续流形学习 链接:https://arxiv.org/abs/2112.03379

作者:Seungwoo Jeong,Wonjun Ko,Ahmad Wisnu Mulyadi,Heung-Il Suk 机构:Department of Artificial Intelligence, Korea University, Department of Brain and Cognitive Engineering, Korea University 摘要:随着深度神经网络在不同领域的空前成功,非欧几里德数据建模正引起人们的关注。特别是,由于对称正定(SPD)矩阵能够学习适当的统计表示,它在计算机视觉、信号处理和医学图像分析中正受到积极的研究。然而,由于其强大的约束,对于优化问题或低效的计算成本仍然具有挑战性,尤其是在深度学习框架内。本文提出利用黎曼流形与Cholesky空间之间的微分同胚映射,不仅可以有效地解决优化问题,而且可以大大降低计算量。此外,为了在时间序列数据中进行动力学建模,我们通过系统地集成流形常微分方程和选通递归神经网络,设计了一种连续流形学习方法。值得注意的是,由于Cholesky空间中矩阵的良好参数化,因此可以直接使用配备了黎曼几何度量的网络来训练我们提出的网络。我们通过实验证明,所提出的模型能够高效可靠地训练,并且在两个分类任务:动作识别和睡眠分级分类中都优于现有的流形方法和最新的分类方法。 摘要:Modeling non-Euclidean data is drawing attention along with the unprecedented successes of deep neural networks in diverse fields. In particular, symmetric positive definite (SPD) matrix is being actively studied in computer vision, signal processing, and medical image analysis, thanks to its ability to learn appropriate statistical representations. However, due to its strong constraints, it remains challenging for optimization problems or inefficient computation costs, especially, within a deep learning framework. In this paper, we propose to exploit a diffeomorphism mapping between Riemannian manifolds and a Cholesky space, by which it becomes feasible not only to efficiently solve optimization problems but also to reduce computation costs greatly. Further, in order for dynamics modeling in time series data, we devise a continuous manifold learning method by integrating a manifold ordinary differential equation and a gated recurrent neural network in a systematic manner. It is noteworthy that because of the nice parameterization of matrices in a Cholesky space, it is straightforward to train our proposed network with Riemannian geometric metrics equipped. We demonstrate through experiments that the proposed model can be efficiently and reliably trained as well as outperform existing manifold methods and state-of-the-art methods in two classification tasks: action recognition and sleep staging classification.

【14】 Graphical Models with Attention for Context-Specific Independence and an Application to Perceptual Grouping 标题:关注上下文独立性的图形模型及其在知觉分组中的应用 链接:https://arxiv.org/abs/2112.03371

作者:Guangyao Zhou,Wolfgang Lehrach,Antoine Dedieu,Miguel Lázaro-Gredilla,Dileep George 机构:Vicarious AI 摘要:离散无向图形模型,也称为马尔可夫随机场(MRF),可以灵活地编码多变量的概率交互作用,并在广泛的问题中得到了成功的应用。然而,离散MRF的一个众所周知但很少研究的局限性是,它们不能捕获特定于上下文的独立性(CSI)。现有的方法需要精心开发的理论和专门构建的推理方法,这将它们的应用局限于小规模问题。在本文中,我们提出了马尔可夫注意模型(MAM),这是一个包含注意机制的离散MRF家族。注意机制允许变量在忽略其他变量的同时动态关注其他变量,并允许在MRF中捕获CSI。MAM被表述为MRF,允许它受益于丰富的现有MRF推理方法,并扩展到大型模型和数据集。为了演示MAM在规模上捕获CSI的能力,我们应用MAM来捕获一种重要类型的CSI,该CSI以符号方式呈现在感知分组中的循环计算中。在最近提出的两个合成感知分组任务和真实图像上的实验表明,与强递归神经网络基线相比,MAM在样本效率、可解释性和可推广性方面具有优势,并验证了MAM在大规模有效捕获CSI的能力。 摘要:Discrete undirected graphical models, also known as Markov Random Fields (MRFs), can flexibly encode probabilistic interactions of multiple variables, and have enjoyed successful applications to a wide range of problems. However, a well-known yet little studied limitation of discrete MRFs is that they cannot capture context-specific independence (CSI). Existing methods require carefully developed theories and purpose-built inference methods, which limit their applications to only small-scale problems. In this paper, we propose the Markov Attention Model (MAM), a family of discrete MRFs that incorporates an attention mechanism. The attention mechanism allows variables to dynamically attend to some other variables while ignoring the rest, and enables capturing of CSIs in MRFs. A MAM is formulated as an MRF, allowing it to benefit from the rich set of existing MRF inference methods and scale to large models and datasets. To demonstrate MAM's capabilities to capture CSIs at scale, we apply MAMs to capture an important type of CSI that is present in a symbolic approach to recurrent computations in perceptual grouping. Experiments on two recently proposed synthetic perceptual grouping tasks and on realistic images demonstrate the advantages of MAMs in sample-efficiency, interpretability and generalizability when compared with strong recurrent neural network baselines, and validate MAM's capabilities to efficiently capture CSIs at scale.

【15】 Associative Memories Using Complex-Valued Hopfield Networks Based on Spin-Torque Oscillator Arrays 标题:基于自旋扭矩振荡器阵列的复值Hopfield网络联想存储器 链接:https://arxiv.org/abs/2112.03358

作者:Nitin Prasad,Prashansa Mukim,Advait Madhavan,Mark D. Stiles 机构: Associate, Physical Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA, Department of Chemistry and Biochemistry, University of Maryland, College Park, MD, USA 备注:17 pages, 7 figures 摘要:基于自旋转矩振荡器的复值Hopfield网络仿真可以恢复相位编码图像。忆阻器增强逆变器序列提供可调延迟元件,通过移相振荡器的振荡输出来实现复数权重。伪逆训练足以在一组192个振荡器中存储至少12个图像,代表16$乘以$12像素图像。恢复图像所需的能量取决于所需的错误级别。对于这里考虑的振荡器和电路,理想图像的5%均方根偏差需要大约5$mu$s,消耗大约130 nJ。仿真结果表明,当振荡器的谐振频率可调谐到小于$10^{-3}$的分数扩展时,网络功能良好,这取决于反馈的强度。 摘要:Simulations of complex-valued Hopfield networks based on spin-torque oscillators can recover phase-encoded images. Sequences of memristor-augmented inverters provide tunable delay elements that implement complex weights by phase shifting the oscillatory output of the oscillators. Pseudo-inverse training suffices to store at least 12 images in a set of 192 oscillators, representing 16$ imes$12 pixel images. The energy required to recover an image depends on the desired error level. For the oscillators and circuitry considered here, 5 % root mean square deviations from the ideal image require approximately 5 $mu$s and consume roughly 130 nJ. Simulations show that the network functions well when the resonant frequency of the oscillators can be tuned to have a fractional spread less than $10^{-3}$, depending on the strength of the feedback.

【16】 Toward a Taxonomy of Trust for Probabilistic Machine Learning 标题:面向概率机器学习的信任分类研究 链接:https://arxiv.org/abs/2112.03270

作者:Tamara Broderick,Andrew Gelman,Rachael Meager,Anna L. Smith,Tian Zheng 机构:Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology;, Department of Statistics, Columbia University;, Department of Political Science, Columbia University; 备注:18 pages, 2 figures 摘要:概率机器学习越来越多地为医学、经济、政治等领域的关键决策提供信息。我们需要证据来证明最终的决定是有根据的。为了帮助在这些决策中建立信任,我们开发了一种分类法,描述了分析中的信任可以分解的地方:(1)将现实世界的目标转化为特定可用训练数据集上的目标,(2)将训练数据上的抽象目标转化为具体的数学问题,(3)在使用算法解决所述数学问题时,(4)在使用特定代码实现所选算法时。我们详细说明了信任如何在每一步都会失败,并通过两个案例研究说明了我们的分类:小额信贷的有效性分析和《经济学人》对2020年美国总统选举的预测。最后,我们描述了各种各样的方法,这些方法可用于在分类的每个步骤中增加信任。使用我们的分类法突出了现有信任研究工作集中的步骤,以及建立信任特别具有挑战性的步骤。 摘要:Probabilistic machine learning increasingly informs critical decisions in medicine, economics, politics, and beyond. We need evidence to support that the resulting decisions are well-founded. To aid development of trust in these decisions, we develop a taxonomy delineating where trust in an analysis can break down: (1) in the translation of real-world goals to goals on a particular set of available training data, (2) in the translation of abstract goals on the training data to a concrete mathematical problem, (3) in the use of an algorithm to solve the stated mathematical problem, and (4) in the use of a particular code implementation of the chosen algorithm. We detail how trust can fail at each step and illustrate our taxonomy with two case studies: an analysis of the efficacy of microcredit and The Economist's predictions of the 2020 US presidential election. Finally, we describe a wide variety of methods that can be used to increase trust at each step of our taxonomy. The use of our taxonomy highlights steps where existing research work on trust tends to concentrate and also steps where establishing trust is particularly challenging.

【17】 A Deep-Learning Intelligent System Incorporating Data Augmentation for Short-Term Voltage Stability Assessment of Power Systems 标题:一种结合数据增强的电力系统短期电压稳定评估深度学习智能系统 链接:https://arxiv.org/abs/2112.03265

作者:Yang Li,Meng Zhang,Chen Chen 机构:a School of Electrical Engineering, Northeast Electric Power University, Jilin , China, b National Key Laboratory of Science and Technology on Vessel Integrated Power System, Naval University of, Engineering, Wuhan , China 备注:Accepted by Applied Energy 摘要:面对昂贵且繁琐的数据采集和注释的困难,如何使基于深度学习的短期电压稳定评估(STVSA)模型在较小的训练数据集上运行良好是一个具有挑战性和紧迫性的问题。虽然应急模拟可以直接生成足够大的数据集,但该数据生成过程通常繁琐且效率低下;而数据扩充提供了一种低成本、高效的方法,通过保留标签的变换,人为地膨胀具有代表性的、多样化的训练数据集。在这方面,本文提出了一种新的深度学习智能系统,用于电力系统的STVSA。首先,由于缺乏可靠的定量标准来判断特定电力系统的稳定状态,因此利用半监督聚类学习来获取原始小数据集中的标记样本。其次,为了使深度学习适用于小数据集,引入了基于条件最小二乘生成对抗网络(LSGAN)的数据扩充,通过人工创建额外的有效样本来扩展原始数据集。第三,为了从系统受干扰后的动态轨迹中提取时间依赖关系,建立了基于注意机制的双向门控递归单元评估模型,双向学习重要的时间依赖关系并自动分配注意权重。测试结果表明,提出的方法能够在原始小数据集上获得更好的精度和更快的响应时间。除了分类准确性外,这项工作还采用统计方法全面检查提案的执行情况。 摘要:Facing the difficulty of expensive and trivial data collection and annotation, how to make a deep learning-based short-term voltage stability assessment (STVSA) model work well on a small training dataset is a challenging and urgent problem. Although a big enough dataset can be directly generated by contingency simulation, this data generation process is usually cumbersome and inefficient; while data augmentation provides a low-cost and efficient way to artificially inflate the representative and diversified training datasets with label preserving transformations. In this respect, this paper proposes a novel deep-learning intelligent system incorporating data augmentation for STVSA of power systems. First, due to the unavailability of reliable quantitative criteria to judge the stability status for a specific power system, semi-supervised cluster learning is leveraged to obtain labeled samples in an original small dataset. Second, to make deep learning applicable to the small dataset, conditional least squares generative adversarial networks (LSGAN)-based data augmentation is introduced to expand the original dataset via artificially creating additional valid samples. Third, to extract temporal dependencies from the post-disturbance dynamic trajectories of a system, a bi-directional gated recurrent unit with attention mechanism based assessment model is established, which bi-directionally learns the significant time dependencies and automatically allocates attention weights. The test results demonstrate the presented approach manages to achieve better accuracy and a faster response time with original small datasets. Besides classification accuracy, this work employs statistical measures to comprehensively examine the performance of the proposal.

【18】 Image Enhancement via Bilateral Learning 标题:基于双边学习的图像增强 链接:https://arxiv.org/abs/2112.03888

作者:Saeedeh Rezaee,Nezam Mahdavi-Amiri 机构:Sharif University of Technology, Tehran, Iran 摘要:如今,由于先进的数字成像技术和公众的互联网接入,生成的数字图像数量急剧增加。因此,对自动图像增强技术的需求非常明显。近年来,深度学习得到了有效的应用。本文在介绍了近年来在图像增强方面的一些研究成果后,提出了一种基于卷积神经网络的图像增强系统。我们的目标是有效地利用两种可用的方法,卷积神经网络和双边网格。在我们的方法中,我们增加了训练数据和模型维度,并在训练过程中提出了一个可变比率。与其他可用方法相比,由我们提出的方法产生的增强结果(包括5位不同的专家)显示了定量和定性的改进。 摘要:Nowadays, due to advanced digital imaging technologies and internet accessibility to the public, the number of generated digital images has increased dramatically. Thus, the need for automatic image enhancement techniques is quite apparent. In recent years, deep learning has been used effectively. Here, after introducing some recently developed works on image enhancement, an image enhancement system based on convolutional neural networks is presented. Our goal is to make an effective use of two available approaches, convolutional neural network and bilateral grid. In our approach, we increase the training data and the model dimensions and propose a variable rate during the training process. The enhancement results produced by our proposed method, while incorporating 5 different experts, show both quantitative and qualitative improvements as compared to other available methods.

【19】 Physics guided deep learning generative models for crystal materials discovery 标题:物理引导的深度学习晶体材料发现的产生式模型 链接:https://arxiv.org/abs/2112.03528

作者:Yong Zhao,Edirisuriya MD Siriwardane,Jianjun Hu 机构:Department of Computer Science and Engineering, University of South Carolina, Assembly Street, Columbia, SC 摘要:基于深度学习的生成模型(如deepfake)能够生成令人惊叹的图像和视频。然而,当这些模型用于生成晶体材料结构时,可能需要进行重大转换,其中构建块、物理原子与像素非常不同。天真的转换生成模型倾向于生成大部分物理上不可行的晶体结构,这些晶体结构不稳定或不可合成。在此,我们表明,通过开发和添加面向物理的数据增强、损失函数项和后处理,我们基于深度对抗网络(GAN)的生成模型现在可以生成具有更高物理可行性的晶体结构,并扩展我们以前只能创建立方结构的模型。 摘要:Deep learning based generative models such as deepfake have been able to generate amazing images and videos. However, these models may need significant transformation when applied to generate crystal materials structures in which the building blocks, the physical atoms are very different from the pixels. Naively transferred generative models tend to generate a large portion of physically infeasible crystal structures that are not stable or synthesizable. Herein we show that by exploiting and adding physically oriented data augmentation, loss function terms, and post processing, our deep adversarial network (GAN) based generative models can now generate crystal structures with higher physical feasibility and expand our previous models which can only create cubic structures.

【20】 Training Deep Models to be Explained with Fewer Examples 标题:训练深度模型需要用更少的例子来解释 链接:https://arxiv.org/abs/2112.03508

作者:Tomoharu Iwata,Yuya Yoshikawa 机构:NTT Communication Science Laboratories, Software Technology and Artificial Intelligence Research Laboratory, Chiba Institute of Technology 摘要:尽管深度模型具有很高的预测性能,但人类很难理解他们所做的预测。解释性对于真实应用程序来说很重要,以证明其可靠性。已经提出了许多基于示例的解释方法,例如重新输入点选择,其中由一组训练示例定义的解释模型用于解释预测模型。为了提高解释性,减少解释模型中的示例数非常重要。然而,使用较少实例的解释可能是不可靠的,因为用这种基于实例的解释模型很难很好地逼近预测模型。不忠实的解释意味着可解释模型的预测与预测模型的预测不同。我们提出了一种训练深度模型的方法,使得它们的预测能够被解释模型用少量的例子忠实地解释。我们使用稀疏正则化器同时训练预测和解释模型,以减少示例数。该方法可用于任何基于神经网络的预测模型。使用多个数据集的实验表明,该方法在保持预测性能的同时提高了信度。 摘要:Although deep models achieve high predictive performance, it is difficult for humans to understand the predictions they made. Explainability is important for real-world applications to justify their reliability. Many example-based explanation methods have been proposed, such as representer point selection, where an explanation model defined by a set of training examples is used for explaining a prediction model. For improving the interpretability, reducing the number of examples in the explanation model is important. However, the explanations with fewer examples can be unfaithful since it is difficult to approximate prediction models well by such example-based explanation models. The unfaithful explanations mean that the predictions by the explainable model are different from those by the prediction model. We propose a method for training deep models such that their predictions are faithfully explained by explanation models with a small number of examples. We train the prediction and explanation models simultaneously with a sparse regularizer for reducing the number of examples. The proposed method can be incorporated into any neural network-based prediction models. Experiments using several datasets demonstrate that the proposed method improves faithfulness while keeping the predictive performance.

【21】 Explicitly antisymmetrized neural network layers for variational Monte Carlo simulation 标题:用于变分蒙特卡罗模拟的显式反对称神经网络层 链接:https://arxiv.org/abs/2112.03491

作者:Jeffmin Lin,Gil Goldshlager,Lin Lin 机构:Department of Mathematics, University of California, Berkeley, CA , USA, Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA , USA 备注:33 pages, 14 figures 摘要:神经网络和量子蒙特卡罗方法的结合已成为高精度电子结构计算的一条前进道路。以前的建议将等变神经网络层与反对称层相结合,以满足电子波函数的反对称要求。然而,到目前为止,还不清楚是否可以表示物理意义上的反对称函数,并且很难测量反对称层的表现力。这项工作试图通过引入明确的反对称通用神经网络层作为诊断工具来解决这个问题。我们首先介绍了一个通用的反对称层(GA),我们用它来代替高精度安萨兹的整个反对称层,称为费米内。我们证明了由此产生的FermiNet GA结构可以有效地产生小系统的精确基态能量。然后,我们考虑一个分解的反对称(FA)层,更直接地推广FiMnET通过替换产品的决定因素与产品的反对称神经网络。有趣的是,由此产生的FermiNet FA体系结构并不优于FermiNet。这表明,反对称积的总和是费明特体系结构的一个关键限制方面。为了进一步探索这一点,我们研究了费米涅模式的一个微小修改,称为完全行列式模式,它用单个组合行列式替换行列式的每个乘积。完全单决定簇FermiNet缩小了标准单决定簇FermiNet和FermiNet-GA之间的大部分差距。令人惊讶的是,在解离键长度为4.0玻尔的氮分子上,完全单决定簇FermiNet可以显著优于标准64决定簇FermiNet,产生的能量在最佳可用计算基准的0.4 kcal/mol范围内。 摘要:The combination of neural networks and quantum Monte Carlo methods has arisen as a path forward for highly accurate electronic structure calculations. Previous proposals have combined equivariant neural network layers with an antisymmetric layer to satisfy the antisymmetry requirements of the electronic wavefunction. However, to date it is unclear if one can represent antisymmetric functions of physical interest, and it is difficult to measure the expressiveness of the antisymmetric layer. This work attempts to address this problem by introducing explicitly antisymmetrized universal neural network layers as a diagnostic tool. We first introduce a generic antisymmetric (GA) layer, which we use to replace the entire antisymmetric layer of the highly accurate ansatz known as the FermiNet. We demonstrate that the resulting FermiNet-GA architecture can yield effectively the exact ground state energy for small systems. We then consider a factorized antisymmetric (FA) layer which more directly generalizes the FermiNet by replacing products of determinants with products of antisymmetrized neural networks. Interestingly, the resulting FermiNet-FA architecture does not outperform the FermiNet. This suggests that the sum of products of antisymmetries is a key limiting aspect of the FermiNet architecture. To explore this further, we investigate a slight modification of the FermiNet called the full determinant mode, which replaces each product of determinants with a single combined determinant. The full single-determinant FermiNet closes a large part of the gap between the standard single-determinant FermiNet and FermiNet-GA. Surprisingly, on the nitrogen molecule at a dissociating bond length of 4.0 Bohr, the full single-determinant FermiNet can significantly outperform the standard 64-determinant FermiNet, yielding an energy within 0.4 kcal/mol of the best available computational benchmark.

【22】 Emulating Spatio-Temporal Realizations of Three-Dimensional Isotropic Turbulence via Deep Sequence Learning Models 标题:用深序列学习模型模拟三维各向同性湍流的时空实现 链接:https://arxiv.org/abs/2112.03469

作者:Mohammadreza Momenifar,Enmao Diao,Vahid Tarokh,Andrew D. Bragg 机构:Department of Civil and Environmental Engineering,Department of Electrical and Computer Engineering, Duke University, Durham, NC, USA 备注:AI2ASE: AAAI Workshop on AI to Accelerate Science and Engineering, 2022 摘要:我们使用数据驱动的方法,使用尖端的深度学习技术对三维湍流进行建模。深度学习框架结合了流的物理约束,例如保持不可压缩性和速度梯度张量的全局统计不变量。使用基于统计和物理的指标评估模型的准确性。该数据集来自立方箱内不可压缩、统计平稳、各向同性湍流的直接数值模拟。由于数据集的大小是内存密集型的,我们首先生成速度数据的低维表示,然后将其传递给序列预测网络,该网络学习基础数据的空间和时间相关性。降维是通过使用矢量量化自动编码器(VQ-AE)进行提取来实现的,该编码器学习离散的潜在变量。对于序列预测,使用了自然语言处理中的Transformer架构的思想,并将其性能与更标准的递归网络(如卷积LSTM)进行了比较。这些体系结构被设计和训练为执行序列到序列的多类分类任务,其中它们获取固定长度(k)的输入序列,并预测固定长度(p)的序列,表示流的未来时刻。我们的短期预测结果表明,由于预测的自回归性质,两个模型的预测结果的准确性在预测快照中都会恶化。根据我们的诊断测试,经过训练的Conv-Transformer模型优于Conv-LSTM模型,能够在定量和定性上准确地保留大尺度,并很好地捕获流动的惯性尺度,但无法恢复小的和间歇的流体运动。 摘要:We use a data-driven approach to model a three-dimensional turbulent flow using cutting-edge Deep Learning techniques. The deep learning framework incorporates physical constraints on the flow, such as preserving incompressibility and global statistical invariants of velocity gradient tensor. The accuracy of the model is assessed using statistical and physics-based metrics. The data set comes from Direct Numerical Simulation of an incompressible, statistically stationary, isotropic turbulent flow in a cubic box. Since the size of the dataset is memory intensive, we first generate a low-dimensional representation of the velocity data, and then pass it to a sequence prediction network that learns the spatial and temporal correlations of the underlying data. The dimensionality reduction is performed via extraction using Vector-Quantized Autoencoder (VQ-AE), which learns the discrete latent variables. For the sequence forecasting, the idea of Transformer architecture from natural language processing is used, and its performance compared against more standard Recurrent Networks (such as Convolutional LSTM). These architectures are designed and trained to perform a sequence to sequence multi-class classification task in which they take an input sequence with a fixed length (k) and predict a sequence with a fixed length (p), representing the future time instants of the flow. Our results for the short-term predictions show that the accuracy of results for both models deteriorates across predicted snapshots due to autoregressive nature of the predictions. Based on our diagnostics tests, the trained Conv-Transformer model outperforms the Conv-LSTM one and can accurately, both quantitatively and qualitatively, retain the large scales and capture well the inertial scales of flow but fails at recovering the small and intermittent fluid motions.

【23】 Using Image Transformations to Learn Network Structure 标题:利用图像变换学习网络结构 链接:https://arxiv.org/abs/2112.03419

作者:Brayan Ortiz,Amitabh Sinha 机构:†These authors contributed equally to this work. 备注:11 pages, 6 figures, 5 tables, In Submission with International Journal of Data Science and Analytics, Special Issue: Domain Driven Data Mining 摘要:许多学习任务需要观察一系列图像并做出决定。在设计和规划节点间装运箱的运输问题中,我们展示了如何将节点网络和节点之间的流视为图像。这些图像具有有用的结构信息,可以进行统计总结。使用图像压缩技术,我们将图像压缩为一组数字,其中包含可解释的地理信息,我们称之为地理特征。通过使用地理特征,我们可以了解可用于推荐未来网络连接的网络结构。我们开发了一种贝叶斯强化算法,该算法利用统计汇总的网络信息作为先验信息和用户决策来强化代理的概率决策。 摘要:Many learning tasks require observing a sequence of images and making a decision. In a transportation problem of designing and planning for shipping boxes between nodes, we show how to treat the network of nodes and the flows between them as images. These images have useful structural information that can be statistically summarized. Using image compression techniques, we reduce an image down to a set of numbers that contain interpretable geographic information that we call geographic signatures. Using geographic signatures, we learn network structure that can be utilized to recommend future network connectivity. We develop a Bayesian reinforcement algorithm that takes advantage of statistically summarized network information as priors and user-decisions to reinforce an agent's probabilistic decision.

其他(22篇)

【1】 Information is Power: Intrinsic Control via Information Capture 标题:信息就是权力:通过信息捕获实现内在控制 链接:https://arxiv.org/abs/2112.03899

作者:Nicholas Rhinehart,Jenny Wang,Glen Berseth,John D. Co-Reyes,Danijar Hafner,Chelsea Finn,Sergey Levine 机构:UC Berkeley, University of Toronto, Google Research, Brain Team, Stanford University 备注:NeurIPS 2021 摘要:人类和动物探索他们的环境并获得有用的技能,即使在没有明确目标的情况下,也表现出内在的动机。人工智能体的内在动机研究涉及以下问题:智能体的良好通用目标是什么?我们研究了动态部分观测环境中的这一问题,并认为一个紧凑且通用的学习目标是最小化使用潜在状态空间模型估计的agent状态访问的熵。这一目标促使一个主体既收集有关其环境的信息,从而减少不确定性,又获得对其环境的控制,从而减少未来世界国家的不可预测性。我们将此方法实例化为一个配备有深度变分贝叶斯滤波器的深度强化学习代理。我们发现,我们的代理在各种部分观察到的环境中学习发现、表示和控制动态对象,这些环境是通过视觉观察感知到的,而没有外部奖励。 摘要:Humans and animals explore their environment and acquire useful skills even in the absence of clear goals, exhibiting intrinsic motivation. The study of intrinsic motivation in artificial agents is concerned with the following question: what is a good general-purpose objective for an agent? We study this question in dynamic partially-observed environments, and argue that a compact and general learning objective is to minimize the entropy of the agent's state visitation estimated using a latent state-space model. This objective induces an agent to both gather information about its environment, corresponding to reducing uncertainty, and to gain control over its environment, corresponding to reducing the unpredictability of future world states. We instantiate this approach as a deep reinforcement learning agent equipped with a deep variational Bayes filter. We find that our agent learns to discover, represent, and exercise control of dynamic objects in a variety of partially-observed environments sensed with visual observations without extrinsic reward.

【2】 Towards a Shared Rubric for Dataset Annotation 标题:迈向数据集注释的共享标准 链接:https://arxiv.org/abs/2112.03867

作者:Andrew Marc Greene 机构:Adobe 备注:4 pages. To be presented at the Data-Centric AI Workshop at NeurIPS 2021 摘要:在安排第三方数据注释时,很难比较竞争供应商应用最佳实践创建高质量数据集的情况。这导致了一场“竞逐到底”,完全基于价格的竞争使得供应商很难对高质量的注释收费。我们提出了一个自愿性的准则,可用于(a)作为记分卡来比较供应商的产品,(b)比今天更清楚、更一致地传达我们对供应商的期望,(c)证明选择最低投标人以外的人的费用是合理的,以及(d)鼓励注释提供商改进他们的做法。 摘要:When arranging for third-party data annotation, it can be hard to compare how well the competing providers apply best practices to create high-quality datasets. This leads to a "race to the bottom," where competition based solely on price makes it hard for vendors to charge for high-quality annotation. We propose a voluntary rubric which can be used (a) as a scorecard to compare vendors' offerings, (b) to communicate our expectations of the vendors more clearly and consistently than today, (c) to justify the expense of choosing someone other than the lowest bidder, and (d) to encourage annotation providers to improve their practices.

【3】 Traversing within the Gaussian Typical Set: Differentiable Gaussianization Layers for Inverse Problems Augmented by Normalizing Flows 标题:在高斯典型集合内的遍历:归一化流增强的反问题的可微高斯化层 链接:https://arxiv.org/abs/2112.03860

作者:Dongzhuo Li,Huseyin Denli 机构:ExxonMobil Research & Engineering Company, Annandale, NJ , USA 备注:16 pages, 12 figures 摘要:生成网络(如规范化流)可以作为一种基于学习的方法,用于增强反问题,以获得高质量的结果。然而,当在反演期间遍历潜在空间时,潜在空间向量可能不会保持来自期望的高维标准高斯分布的典型样本。因此,实现高保真解决方案可能是一个挑战,尤其是在存在噪声和不准确的基于物理的模型的情况下。为了解决这个问题,我们建议使用新的可微数据相关层对潜在向量进行重新参数化和高斯化,其中通过解决优化问题定义自定义运算符。这些建议的层强制进行反演,以在典型的高斯潜在空间集中找到可行的解。我们在图像去模糊任务和eikonal层析成像(一种PDE约束反问题)上测试并验证了我们的技术,并获得了高保真的结果。 摘要:Generative networks such as normalizing flows can serve as a learning-based prior to augment inverse problems to achieve high-quality results. However, the latent space vector may not remain a typical sample from the desired high-dimensional standard Gaussian distribution when traversing the latent space during an inversion. As a result, it can be challenging to attain a high-fidelity solution, particularly in the presence of noise and inaccurate physics-based models. To address this issue, we propose to re-parameterize and Gaussianize the latent vector using novel differentiable data-dependent layers wherein custom operators are defined by solving optimization problems. These proposed layers enforce an inversion to find a feasible solution within a Gaussian typical set of the latent space. We tested and validated our technique on an image deblurring task and eikonal tomography -- a PDE-constrained inverse problem and achieved high-fidelity results.

【4】 Grounded Language-Image Pre-training 标题:扎根的语言-形象预训 链接:https://arxiv.org/abs/2112.03857

作者:Liunian Harold Li,Pengchuan Zhang,Haotian Zhang,Jianwei Yang,Chunyuan Li,Yiwu Zhong,Lijuan Wang,Lu Yuan,Lei Zhang,Jenq-Neng Hwang,Kai-Wei Chang,Jianfeng Gao 机构:UCLA,Microsoft Research,University of Washington, University of Wisconsin-Madison,Microsoft Cloud and AI,International Digital Economy Academy 备注:Code will be released at this https URL 摘要:本文提出了一个用于学习对象级、语言感知和语义丰富的视觉表征的扎根语言图像预训练(GLIP)模型。GLIP将目标检测和短语基础统一用于预训练。这种统一带来了两个好处:1)它允许GLIP从检测和接地数据中学习,以改进任务和引导良好的接地模型;2) GLIP可以利用大量的图像-文本对,以自我训练的方式生成基础框,使学习到的表示语义丰富。在我们的实验中,我们在27M的基础数据上预训练GLIP,包括3M的人类注释和24M的网络爬网图像-文本对。学习到的表示法显示出很强的Zero-Shot和少量镜头可转移到各种对象级识别任务。1) 当直接在COCO和LVIS上进行评估时(在训练前没有看到COCO中的任何图像),GLIP分别达到49.8 AP和26.9 AP,超过了许多监督基线。2) 在COCO上进行微调后,GLIP在val上达到60.8 AP,在测试开发上达到61.5 AP,超过了之前的SoTA。3) 当转移到13个下游目标检测任务时,一个单发GLIP与一个完全监督的动态头部相匹敌。守则将于https://github.com/microsoft/GLIP. 摘要:This paper presents a grounded language-image pre-training (GLIP) model for learning object-level, language-aware, and semantic-rich visual representations. GLIP unifies object detection and phrase grounding for pre-training. The unification brings two benefits: 1) it allows GLIP to learn from both detection and grounding data to improve both tasks and bootstrap a good grounding model; 2) GLIP can leverage massive image-text pairs by generating grounding boxes in a self-training fashion, making the learned representation semantic-rich. In our experiments, we pre-train GLIP on 27M grounding data, including 3M human-annotated and 24M web-crawled image-text pairs. The learned representations demonstrate strong zero-shot and few-shot transferability to various object-level recognition tasks. 1) When directly evaluated on COCO and LVIS (without seeing any images in COCO during pre-training), GLIP achieves 49.8 AP and 26.9 AP, respectively, surpassing many supervised baselines. 2) After fine-tuned on COCO, GLIP achieves 60.8 AP on val and 61.5 AP on test-dev, surpassing prior SoTA. 3) When transferred to 13 downstream object detection tasks, a 1-shot GLIP rivals with a fully-supervised Dynamic Head. Code will be released at https://github.com/microsoft/GLIP.

【5】 Augment & Valuate : A Data Enhancement Pipeline for Data-Centric AI 标题:扩充与赋值:一种以数据为中心的人工智能数据增强管道 链接:https://arxiv.org/abs/2112.03837

作者:Youngjune Lee,Oh Joon Kwon,Haeju Lee,Joonyoung Kim,Kangwook Lee,Kee-Eung Kim 机构:School of Computing, KAIST, Daejeon, Republic of Korea, Kim Jaechul Graduate School of AI, KAIST, Daejeon, Republic of Korea, Samsung Research, Republic of Korea 备注:Data Centric AI Workshop at NeurIPS 2021 摘要:数据稀缺性和噪声是机器学习工业应用中的重要问题。然而,设计一种可扩展的、通用的方法来解决具有黑盒模型的数据集的基本分布和语义特性往往是一个挑战。因此,以数据为中心的方法对于机器学习操作管道的自动化至关重要。为了作为这种自动化的基础,我们提出了一种领域不可知的管道,用于改进图像分类问题中的数据质量。此管道包含数据评估、清理和扩充。通过这些方法的适当组合,我们可以在以数据为中心的人工智能竞赛中,仅使用提供的数据集就可以实现84.711%的测试准确率(排名第6,最具创新性的荣誉奖)。 摘要:Data scarcity and noise are important issues in industrial applications of machine learning. However, it is often challenging to devise a scalable and generalized approach to address the fundamental distributional and semantic properties of dataset with black box models. For this reason, data-centric approaches are crucial for the automation of machine learning operation pipeline. In order to serve as the basis for this automation, we suggest a domain-agnostic pipeline for refining the quality of data in image classification problems. This pipeline contains data valuation, cleansing, and augmentation. With an appropriate combination of these methods, we could achieve 84.711% test accuracy (ranked #6, Honorable Mention in the Most Innovative) in the Data-Centric AI competition only with the provided dataset.

【6】 A Continuous-time Stochastic Gradient Descent Method for Continuous Data 标题:连续数据的一种连续时间随机梯度下降法 链接:https://arxiv.org/abs/2112.03754

作者:Kexin Jin,Jonas Latz,Chenguang Liu,Carola-Bibiane Schönlieb 机构:Department of Mathematics, Princeton University, Princeton, NJ ,-, USA, School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh, EH,AS, United Kingdom, Delft Institute of Applied Mathematics, Technische Universiteit Delft, Delft, The Netherlands 摘要:具有连续数据的优化问题出现在鲁棒机器学习、功能数据分析和变分推理等领域。这里,目标函数作为一系列(连续)索引目标函数上的积分给出——相对于概率测度积分。这类问题通常可以通过随机优化方法来解决:使用随机切换的索引对索引目标函数执行优化步骤。在这项工作中,我们研究了连续数据优化问题的随机梯度下降算法的连续时间变量。这种所谓的随机梯度过程包含一个梯度流,该梯度流最小化一个指数目标函数,该函数与确定指数的连续时间指数过程耦合。指数过程是,例如,在紧致空间上的反射扩散、纯跳跃过程或其他Léevy过程。因此,我们研究了连续数据空间的多个采样模式,并允许在算法运行时模拟或传输数据。我们分析了随机梯度过程的逼近性质,并研究了它在恒定和递减学习率下的长期行为和遍历性。最后,我们举例说明了随机梯度过程在含噪声函数数据的多项式回归问题以及物理信息神经网络中的适用性。 摘要:Optimization problems with continuous data appear in, e.g., robust machine learning, functional data analysis, and variational inference. Here, the target function is given as an integral over a family of (continuously) indexed target functions - integrated with respect to a probability measure. Such problems can often be solved by stochastic optimization methods: performing optimization steps with respect to the indexed target function with randomly switched indices. In this work, we study a continuous-time variant of the stochastic gradient descent algorithm for optimization problems with continuous data. This so-called stochastic gradient process consists in a gradient flow minimizing an indexed target function that is coupled with a continuous-time index process determining the index. Index processes are, e.g., reflected diffusions, pure jump processes, or other L'evy processes on compact spaces. Thus, we study multiple sampling patterns for the continuous data space and allow for data simulated or streamed at runtime of the algorithm. We analyze the approximation properties of the stochastic gradient process and study its longtime behavior and ergodicity under constant and decreasing learning rates. We end with illustrating the applicability of the stochastic gradient process in a polynomial regression problem with noisy functional data, as well as in a physics-informed neural network.

【7】 A coarse space acceleration of deep-DDM 标题:深DDM的一种粗空间加速 链接:https://arxiv.org/abs/2112.03732

作者:Valentin Mercier,Serge Gratton,Pierre Boudier 机构:∗, †, ‡ 摘要:使用深度学习方法解决偏微分方程是一个正在全面扩展的领域。特别是,实现物理域采样并使用惩罚违反偏微分方程的损失函数的物理信息神经网络已显示出巨大的潜力。然而,为了解决实际应用中遇到的大规模问题并与现有的偏微分方程数值方法竞争,设计具有良好可扩展性的并行算法是非常重要的。在传统的区域分解方法(DDM)的脉络中,我们考虑了最近提出的深度DDM方法。我们提出了这种方法的一个扩展,它依赖于使用粗空间校正,类似于在传统DDM解算器中所做的。我们的研究表明,由于每次迭代时子域之间的瞬时信息交换,当子域数量增加时,粗校正能够缓解解算器收敛性的恶化。实验结果表明,我们的方法在减少额外计算量的情况下,显著加快了原有的deep-ddm方法。 摘要:The use of deep learning methods for solving PDEs is a field in full expansion. In particular, Physical Informed Neural Networks, that implement a sampling of the physical domain and use a loss function that penalizes the violation of the partial differential equation, have shown their great potential. Yet, to address large scale problems encountered in real applications and compete with existing numerical methods for PDEs, it is important to design parallel algorithms with good scalability properties. In the vein of traditional domain decomposition methods (DDM), we consider the recently proposed deep-ddm approach. We present an extension of this method that relies on the use of a coarse space correction, similarly to what is done in traditional DDM solvers. Our investigations shows that the coarse correction is able to alleviate the deterioration of the convergence of the solver when the number of subdomains is increased thanks to an instantaneous information exchange between subdomains at each iteration. Experimental results demonstrate that our approach induces a remarkable acceleration of the original deep-ddm method, at a reduced additional computational cost.

【8】 Correlation Based Feature Subset Selection for Multivariate Time-Series Data 标题:基于相关性的多变量时间序列数据特征子集选择 链接:https://arxiv.org/abs/2112.03705

作者:Bahavathy Kathirgamanathan,Padraig Cunningham 机构:School of Computer Science, University College Dublin, Pádraig Cunningham 备注:15 pages, 5 figures 摘要:多元时间序列数据流中的相关性意味着,给定的数据挖掘任务通常只需要特征的一小部分。在本文中,我们提出了一种称为时间序列数据优点评分(MST)的技术,该技术基于单个特征分类器输出的相关模式进行特征子集选择。我们为特征子集分配一个评分,作为选择“好”特征子集的基础。在UEA多元时间序列存档的数据集上对所提出的技术进行了评估,并与用于特征子集选择的包装器方法进行了比较。MSTS被证明对特征子集选择是有效的,并且作为数据缩减技术特别有效。在选择合适的特征子集方面,MST在计算上比包装器策略更有效,对于一些较大的数据集,MST的速度比包装器策略快100倍以上,同时也保持了良好的分类精度。 摘要:Correlations in streams of multivariate time series data means that typically, only a small subset of the features are required for a given data mining task. In this paper, we propose a technique which we call Merit Score for Time-Series data (MSTS) that does feature subset selection based on the correlation patterns of single feature classifier outputs. We assign a Merit Score to the feature subsets which is used as the basis for selecting 'good' feature subsets. The proposed technique is evaluated on datasets from the UEA multivariate time series archive and is compared against a Wrapper approach for feature subset selection. MSTS is shown to be effective for feature subset selection and is in particular effective as a data reduction technique. MSTS is shown here to be computationally more efficient than the Wrapper strategy in selecting a suitable feature subset, being more than 100 times faster for some larger datasets while also maintaining a good classification accuracy.

【9】 Construction de variables à l'aide de classifieurs comme aide à la régression 标题:构建变量àl‘aide classfieur comme aideàla régregregationsComme aideàla régregation 链接:https://arxiv.org/abs/2112.03703

作者:Colin Troisemaine,Vincent Lemaire 机构:∗Orange Labs Lannion, Résumé. Cet article propose une méthode de création automatique de variables, (pour la régression) qui viennent compléter les informations contenues dans le, vecteur initial des variables explicatives. Notre méthode fonctionne comme une 备注:in French 摘要:本文提出了一种自动创建变量(在回归情况下)的方法,以补充初始输入向量中包含的信息。该方法作为预处理步骤,将待回归变量的连续值离散为一组区间,然后用于定义值阈值。然后训练分类器预测待回归的值是否小于或等于这些阈值中的每一个。然后,分类器的不同输出以附加变量向量的形式连接起来,从而丰富回归问题的初始向量。因此,可以将实现的系统视为通用的预处理工具。我们用5种类型的回归器测试了所提出的富集方法,并在33个回归数据集中对其进行了评估。我们的实验结果证实了这种方法的重要性。 摘要:This paper proposes a method for the automatic creation of variables (in the case of regression) that complement the information contained in the initial input vector. The method works as a pre-processing step in which the continuous values of the variable to be regressed are discretized into a set of intervals which are then used to define value thresholds. Then classifiers are trained to predict whether the value to be regressed is less than or equal to each of these thresholds. The different outputs of the classifiers are then concatenated in the form of an additional vector of variables that enriches the initial vector of the regression problem. The implemented system can thus be considered as a generic pre-processing tool. We tested the proposed enrichment method with 5 types of regressors and evaluated it in 33 regression datasets. Our experimental results confirm the interest of the approach.

【10】 Domain Generalization via Progressive Layer-wise and Channel-wise Dropout 标题:基于逐层和逐通道丢弃的域综合 链接:https://arxiv.org/abs/2112.03676

作者:Jintao Guo,Lei Qi,Yinghuan Shi,Yang Gao 机构: National Key Laboratory for Novel Software Technology, Nanjing University, National Institute of Healthcare Data Science, Nanjing University, Key Lab of Computer Network and Information Integration, Southeast University 摘要:通过在多个观测源域上训练一个模型,域泛化的目的是在无需进一步训练的情况下很好地泛化到任意不可见的目标域。现有的工作主要集中在学习领域不变特征以提高泛化能力。然而,由于目标域在训练过程中不可用,以前的方法不可避免地会受到源域过度拟合的影响。为了解决这个问题,我们开发了一个有效的基于退出的框架来扩大模型的关注范围,这可以有效地缓解过度拟合问题。特别是,与通常在固定层上进行辍学的典型辍学方案不同,我们首先随机选择一层,然后随机选择其信道进行辍学。此外,我们还利用渐进式方案增加了训练过程中的辍学率,从而逐渐提高了训练模型的难度,增强了模型的鲁棒性。此外,为了进一步缓解过度拟合问题的影响,我们利用图像级和特征级的增强方案来生成强基线模型。我们在多个基准数据集上进行了大量的实验,结果表明,我们的方法优于最先进的方法。 摘要:By training a model on multiple observed source domains, domain generalization aims to generalize well to arbitrary unseen target domains without further training. Existing works mainly focus on learning domain-invariant features to improve the generalization ability. However, since target domain is not available during training, previous methods inevitably suffer from overfitting in source domains. To tackle this issue, we develop an effective dropout-based framework to enlarge the region of the model's attention, which can effectively mitigate the overfitting problem. Particularly, different from the typical dropout scheme, which normally conducts the dropout on the fixed layer, first, we randomly select one layer, and then we randomly select its channels to conduct dropout. Besides, we leverage the progressive scheme to add the ratio of the dropout during training, which can gradually boost the difficulty of training model to enhance the robustness of the model. Moreover, to further alleviate the impact of the overfitting issue, we leverage the augmentation schemes on image-level and feature-level to yield a strong baseline model. We conduct extensive experiments on multiple benchmark datasets, which show our method can outperform the state-of-the-art methods.

【11】 Question Answering Survey: Directions, Challenges, Datasets, Evaluation Matrices 标题:问答调查:方向、挑战、数据集、评估矩阵 链接:https://arxiv.org/abs/2112.03572

作者:Hariom A. Pandya,Brijesh S. Bhatt 机构:Computer Engineering Department, Dharmsinh Desai University, Nadiad, Gujarat, India 摘要:在过去的十年里,互联网上可用信息的使用和数量都在增加。这种数字化导致需要自动答疑系统从冗余和过渡的知识源中提取丰富的信息。这些系统的设计是为了满足从这个巨大的知识源到使用自然语言理解(NLU)的用户查询的最突出的答案,因此明显依赖于问答(QA)领域。问答包括但不限于将用户问题映射到相关查询、检索相关信息、从检索到的信息中找到最合适的答案等步骤。目前对深度学习模型的改进表明,所有这些任务的性能都有显著提高。本文从问题类型、答案类型、证据来源、答案和建模方法等方面分析了质量保证领域的研究方向。这一细节之后是该领域的公开挑战,如自动问题生成、相似性检测和语言的低资源可用性。最后,对现有数据集和评价方法进行了综述。 摘要:The usage and amount of information available on the internet increase over the past decade. This digitization leads to the need for automated answering system to extract fruitful information from redundant and transitional knowledge sources. Such systems are designed to cater the most prominent answer from this giant knowledge source to the user query using natural language understanding (NLU) and thus eminently depends on the Question-answering(QA) field. Question answering involves but not limited to the steps like mapping of user question to pertinent query, retrieval of relevant information, finding the best suitable answer from the retrieved information etc. The current improvement of deep learning models evince compelling performance improvement in all these tasks. In this review work, the research directions of QA field are analyzed based on the type of question, answer type, source of evidence-answer, and modeling approach. This detailing followed by open challenges of the field like automatic question generation, similarity detection and, low resource availability for a language. In the end, a survey of available datasets and evaluation measures is presented.

【12】 Federated Causal Discovery 标题:联合因果发现 链接:https://arxiv.org/abs/2112.03555

作者:Erdun Gao,Junjia Chen,Li Shen,Tongliang Liu,Mingming Gong,Howard Bondell 机构:†The University of Melbourne, ‡Xi’an Jiaotong University, ⋄JD Explore Academy, §The University of Sydney 摘要:因果发现旨在从观测数据中学习因果图。迄今为止,大多数因果发现方法都需要将数据存储在中央服务器中。然而,数据所有者逐渐拒绝共享他们的个性化数据,以避免隐私泄露,这使得这项任务更加麻烦,因为切断了第一步。出现了一个难题:$ extit{我们如何从分散数据中推断因果关系?}$在本文中,在数据的加性噪声模型假设下,我们迈出了开发基于梯度的学习框架的第一步,名为DAG共享联合因果发现(DS-FCD),它可以在不直接接触本地数据的情况下学习因果图,并自然处理数据异构性。DS-FCD得益于每个本地模型的两级结构。第一级学习因果图并与服务器通信以从其他客户机获取模型信息,而第二级近似因果机制并根据自己的数据进行个人更新以适应数据异构性。此外,DS-FCD利用等式非循环约束将整个学习任务描述为一个连续优化问题,这可以通过梯度下降法自然解决。在合成数据集和真实数据集上的大量实验验证了该方法的有效性。 摘要:Causal discovery aims to learn a causal graph from observational data. To date, most causal discovery methods require data to be stored in a central server. However, data owners gradually refuse to share their personalized data to avoid privacy leakage, making this task more troublesome by cutting off the first step. A puzzle arises: $ extit{how do we infer causal relations from decentralized data?}$ In this paper, with the additive noise model assumption of data, we take the first step in developing a gradient-based learning framework named DAG-Shared Federated Causal Discovery (DS-FCD), which can learn the causal graph without directly touching local data and naturally handle the data heterogeneity. DS-FCD benefits from a two-level structure of each local model. The first level learns the causal graph and communicates with the server to get model information from other clients, while the second level approximates causal mechanisms and personally updates from its own data to accommodate the data heterogeneity. Moreover, DS-FCD formulates the overall learning task as a continuous optimization problem by taking advantage of an equality acyclicity constraint, which can be naturally solved by gradient descent methods. Extensive experiments on both synthetic and real-world datasets verify the efficacy of the proposed method.

【13】 Genetic Algorithm for Constrained Molecular Inverse Design 标题:遗传算法在约束分子反设计中的应用 链接:https://arxiv.org/abs/2112.03518

作者:Yurim Lee,Gydam Choi,Minsug Yoon,Cheongwon Kim 机构:Department of Artificial Intelligence and Language Engineering, Sejong University, Gyudam Choi, Department of Software Convergence, Minsung Yoon∗, Communication & Media Research Laboratory, Electronics and Telecommunications Research Institute 摘要:遗传算法适合于探索较大的搜索空间,因为它能找到近似解。正是由于这一优势,遗传算法能够有效地探索分子搜索空间等广阔而未知的空间。虽然该算法适用于搜索广阔的化学空间,但在保持分子亚结构的同时,很难优化药理学性质。为了解决这个问题,我们引入了一种具有约束分子逆向设计的遗传算法。该算法成功地产生了用于交叉和变异的有效分子。此外,它使用两阶段优化在遵守结构约束的同时优化特定属性。实验证明,我们的算法在保持结构约束的同时,能有效地找到满足特定性质的分子。 摘要:A genetic algorithm is suitable for exploring large search spaces as it finds an approximate solution. Because of this advantage, genetic algorithm is effective in exploring vast and unknown space such as molecular search space. Though the algorithm is suitable for searching vast chemical space, it is difficult to optimize pharmacological properties while maintaining molecular substructure. To solve this issue, we introduce a genetic algorithm featuring a constrained molecular inverse design. The proposed algorithm successfully produces valid molecules for crossover and mutation. Furthermore, it optimizes specific properties while adhering to structural constraints using a two-phase optimization. Experiments prove that our algorithm effectively finds molecules that satisfy specific properties while maintaining structural constraints.

【14】 Location Leakage in Federated Signal Maps 标题:联合信号图中的位置泄漏 链接:https://arxiv.org/abs/2112.03452

作者:Evita Bakopoulou,Jiang Zhang,Justin Ley,Konstantinos Psounis,Athina Markopoulou 机构:Location Leakage in Federated Signal MapsEvita Bakopoulou 1 Jiang Zhang 2Justin Ley 1Konstantinos Psounis 2Athina Markopoulou 1 1University of California Irvine {ebakopou, edu 2University of Southern California {jiangzha 摘要:我们考虑从多个移动设备收集的测量来预测蜂窝网络性能(信号映射)的问题。我们在在线联合学习框架内提出了这个问题:(i)联合学习(FL)使用户能够协作训练模型,同时将训练数据保存在设备上;(ii)随着时间的推移,随着用户的移动而收集测量数据,并以在线方式用于本地训练。我们认为一个诚实但好奇的服务器,观察来自FL的目标用户的更新,并推断出他们的位置使用深度泄漏梯度(DLG)类型的攻击,最初开发重建DNN图像分类器的训练数据。我们的主要观察结果是,应用于我们的设置的DLG攻击可以推断出一批本地数据的平均位置,从而可以在粗粒度上重建目标用户的轨迹。我们表明,梯度平均已经提供了中等程度的隐私保护,这是联邦平均所固有的。此外,我们提出了一种算法,设备可以在本地应用来管理用于本地更新的批,从而在不损害效用的情况下有效地保护其位置隐私。最后,我们证明了多个用户参与FL的效果取决于他们轨迹的相似性。据我们所知,这是首次从众包时空数据在FL环境下研究DLG攻击。 摘要:We consider the problem of predicting cellular network performance (signal maps) from measurements collected by several mobile devices. We formulate the problem within the online federated learning framework: (i) federated learning (FL) enables users to collaboratively train a model, while keeping their training data on their devices; (ii) measurements are collected as users move around over time and are used for local training in an online fashion. We consider an honest-but-curious server, who observes the updates from target users participating in FL and infers their location using a deep leakage from gradients (DLG) type of attack, originally developed to reconstruct training data of DNN image classifiers. We make the key observation that a DLG attack, applied to our setting, infers the average location of a batch of local data, and can thus be used to reconstruct the target users' trajectory at a coarse granularity. We show that a moderate level of privacy protection is already offered by the averaging of gradients, which is inherent to Federated Averaging. Furthermore, we propose an algorithm that devices can apply locally to curate the batches used for local updates, so as to effectively protect their location privacy without hurting utility. Finally, we show that the effect of multiple users participating in FL depends on the similarity of their trajectories. To the best of our knowledge, this is the first study of DLG attacks in the setting of FL from crowdsourced spatio-temporal data.

【15】 Virtual Replay Cache 标题:虚拟重放缓存 链接:https://arxiv.org/abs/2112.03421

作者:Brett Daley,Christopher Amato 机构:Khoury College of Computer Sciences, Northeastern University, Boston, MA 备注:4 pages, 1 figure, 3 tables 摘要:返回缓存是一种最新的策略,它支持使用多步估计器(例如{lambda}-Return)进行有效的小批量训练,以实现深度强化学习。通过在连续批次中预先计算收益估计,然后将结果存储在辅助数据结构中以供以后采样,可以大大减少每次估计所花费的平均计算量。尽管如此,返回缓存的效率还是可以提高的,特别是在内存使用量大和重复数据拷贝方面。我们提出了一种新的数据结构,虚拟重放缓存(VRC),以解决这些缺点。在学习玩Atari 2600游戏时,VRC几乎消除了DQN({lambda})的缓存占用,并略微减少了硬件上的总训练时间。 摘要:Return caching is a recent strategy that enables efficient minibatch training with multistep estimators (e.g. the {lambda}-return) for deep reinforcement learning. By precomputing return estimates in sequential batches and then storing the results in an auxiliary data structure for later sampling, the average computation spent per estimate can be greatly reduced. Still, the efficiency of return caching could be improved, particularly with regard to its large memory usage and repetitive data copies. We propose a new data structure, the Virtual Replay Cache (VRC), to address these shortcomings. When learning to play Atari 2600 games, the VRC nearly eliminates DQN({lambda})'s cache memory footprint and slightly reduces the total training time on our hardware.

【16】 Nested Hyperbolic Spaces for Dimensionality Reduction and Hyperbolic NN Design 标题:嵌套双曲空间降维与双曲神经网络设计 链接:https://arxiv.org/abs/2112.03402

作者:Xiran Fan,Chun-Hao Yang,Baba C. Vemuri 机构:Department of Statistics, National Taiwan University, Institute of Applied Mathematical Science, Department of CISE, University of Florida 备注:19 pages, 6 figures 摘要:双曲线神经网络由于能够有效地表示分层数据集,在最近的一段时间里受到了广泛的欢迎。开发这些网络的挑战在于嵌入空间即双曲空间的非线性。双曲空间是洛伦兹群的齐次黎曼流形。大多数现有方法(除了一些例外)使用局部线性化来定义各种操作,这些操作与欧氏空间中传统深度神经网络中使用的操作并行。在本文中,我们提出了一种新的完全双曲型神经网络,它使用了投影(嵌入)的概念,然后在双曲空间中使用了内在聚集和非线性。这里的新颖之处在于投影,该投影设计用于将数据投影到低维嵌入双曲空间,从而导致嵌套双曲空间表示独立用于降维。主要的理论贡献是在洛伦兹变换下证明了所提出的嵌入是等距的和等变的。该投影在计算上是有效的,因为它可以用简单的线性运算来表示,并且由于上述等变特性,它允许权重共享。嵌套双曲空间表示是我们网络的核心组成部分,因此,我们首先将该嵌套双曲空间表示与其他降维方法(如切线PCA、主测地分析(PGA)和HoroPCA)进行比较。基于这种等变嵌入,我们开发了一种新的全双曲图卷积神经网络结构来学习投影参数。最后,我们在几个公开的数据集上展示了我们网络的比较性能。 摘要:Hyperbolic neural networks have been popular in the recent past due to their ability to represent hierarchical data sets effectively and efficiently. The challenge in developing these networks lies in the nonlinearity of the embedding space namely, the Hyperbolic space. Hyperbolic space is a homogeneous Riemannian manifold of the Lorentz group. Most existing methods (with some exceptions) use local linearization to define a variety of operations paralleling those used in traditional deep neural networks in Euclidean spaces. In this paper, we present a novel fully hyperbolic neural network which uses the concept of projections (embeddings) followed by an intrinsic aggregation and a nonlinearity all within the hyperbolic space. The novelty here lies in the projection which is designed to project data on to a lower-dimensional embedded hyperbolic space and hence leads to a nested hyperbolic space representation independently useful for dimensionality reduction. The main theoretical contribution is that the proposed embedding is proved to be isometric and equivariant under the Lorentz transformations. This projection is computationally efficient since it can be expressed by simple linear operations, and, due to the aforementioned equivariance property, it allows for weight sharing. The nested hyperbolic space representation is the core component of our network and therefore, we first compare this ensuing nested hyperbolic space representation with other dimensionality reduction methods such as tangent PCA, principal geodesic analysis (PGA) and HoroPCA. Based on this equivariant embedding, we develop a novel fully hyperbolic graph convolutional neural network architecture to learn the parameters of the projection. Finally, we present experiments demonstrating comparative performance of our network on several publicly available data sets.

【17】 Manas: Mining Software Repositories to Assist AutoML 标题:MANAS:挖掘软件库以辅助AutoML 链接:https://arxiv.org/abs/2112.03395

作者:Giang Nguyen,Johir Islam,Rangeet Pan,Hridesh Rajan 机构:Dept. of Computer Science, Iowa State University, Ames, IA, USA, Amazon Inc, Austin, TX, USA 摘要:今天,深度学习被广泛用于构建软件。深入学习的软件工程问题是,为任务找到合适的卷积神经网络(CNN)模型对开发人员来说是一个挑战。最近关于AutoML的工作,更准确地说是神经架构搜索(NAS),由Auto Keras等工具实现,旨在通过将其本质上视为搜索问题来解决这个问题,其中起点是默认的CNN模型,这个CNN模型的变异允许探索CNN模型的空间,找到一个最适合这个问题的CNN模型。这些工作在制作高精度CNN模型方面取得了重大成功。然而,有两个问题。首先,NAS可能非常昂贵,通常需要几个小时才能完成。其次,NAS制作的CNN模型可能非常复杂,这使得理解它们变得更加困难,训练它们的成本也更高。我们提出了一种新的NAS方法,它不是从默认的CNN模型开始,而是从GitHub提取的模型库中选择初始模型。直觉是,与默认模型相比,解决类似问题的开发人员可能已经开发出了更好的起点。我们还分析了CNN模型的常见层模式,以了解开发人员为改进模型所做的更改。我们的方法使用常见的变化作为NAS中的变异算子。我们已经扩展了自动KERA以实现我们的方法。我们使用Kaggle针对图像分类和图像回归等任务的8个最有投票权的问题进行评估,结果表明,在相同的搜索时间下,在不损失准确性的情况下,Manas生成的模型的参数数量比Auto Keras的模型少42.9%到99.6%。以GPU为基准,玛纳斯车型的训练速度比自动凯拉斯车型快30.3%至641.6%。 摘要:Today deep learning is widely used for building software. A software engineering problem with deep learning is that finding an appropriate convolutional neural network (CNN) model for the task can be a challenge for developers. Recent work on AutoML, more precisely neural architecture search (NAS), embodied by tools like Auto-Keras aims to solve this problem by essentially viewing it as a search problem where the starting point is a default CNN model, and mutation of this CNN model allows exploration of the space of CNN models to find a CNN model that will work best for the problem. These works have had significant success in producing high-accuracy CNN models. There are two problems, however. First, NAS can be very costly, often taking several hours to complete. Second, CNN models produced by NAS can be very complex that makes it harder to understand them and costlier to train them. We propose a novel approach for NAS, where instead of starting from a default CNN model, the initial model is selected from a repository of models extracted from GitHub. The intuition being that developers solving a similar problem may have developed a better starting point compared to the default model. We also analyze common layer patterns of CNN models in the wild to understand changes that the developers make to improve their models. Our approach uses commonly occurring changes as mutation operators in NAS. We have extended Auto-Keras to implement our approach. Our evaluation using 8 top voted problems from Kaggle for tasks including image classification and image regression shows that given the same search time, without loss of accuracy, Manas produces models with 42.9% to 99.6% fewer number of parameters than Auto-Keras' models. Benchmarked on GPU, Manas' models train 30.3% to 641.6% faster than Auto-Keras' models.

【18】 Guided Imitation of Task and Motion Planning 标题:任务和运动规划的引导式仿真 链接:https://arxiv.org/abs/2112.03386

作者:Michael James McDonald,Dylan Hadfield-Menell 机构:Massachusetts Institute of Technology 备注:16 pages, 6 figures, 2 tables, submitted to Conference on Robot Learning 2021, to be published in Proceedings of Machine Learning Research 摘要:虽然现代政策优化方法可以从感官数据中进行复杂的操作,但它们在时间范围和多个子目标的问题上仍存在困难。另一方面,任务和运动规划(TAMP)方法可以扩展到很长的范围,但它们的计算成本很高,需要精确跟踪世界状态。我们提出了一种利用这两种方法的优点的方法:我们训练策略来模拟TAMP解算器的输出。这产生了一个前馈策略,可以从感官数据完成多步骤任务。首先,我们构建了一个异步分布式TAMP解算器,该解算器能够以足够快的速度生成用于模拟学习的监控数据。然后,我们提出了一个分层策略架构,允许我们使用部分训练的控制策略来加速TAMP求解器。在具有7自由度关节控制的机器人操作任务中,部分训练的策略将规划所需的时间减少了2.6倍。在这些任务中,我们可以了解到一个策略,该策略88%的时间从对象姿势观测中解决RoboSite 4对象拾取位置任务,以及一个策略,该策略79%的时间从RGB图像中解决RoboDesk 9目标基准(9个不同任务的平均值)。 摘要:While modern policy optimization methods can do complex manipulation from sensory data, they struggle on problems with extended time horizons and multiple sub-goals. On the other hand, task and motion planning (TAMP) methods scale to long horizons but they are computationally expensive and need to precisely track world state. We propose a method that draws on the strength of both methods: we train a policy to imitate a TAMP solver's output. This produces a feed-forward policy that can accomplish multi-step tasks from sensory data. First, we build an asynchronous distributed TAMP solver that can produce supervision data fast enough for imitation learning. Then, we propose a hierarchical policy architecture that lets us use partially trained control policies to speed up the TAMP solver. In robotic manipulation tasks with 7-DoF joint control, the partially trained policies reduce the time needed for planning by a factor of up to 2.6. Among these tasks, we can learn a policy that solves the RoboSuite 4-object pick-place task 88% of the time from object pose observations and a policy that solves the RoboDesk 9-goal benchmark 79% of the time from RGB images (averaged across the 9 disparate tasks).

【19】 RafterNet: Probabilistic predictions in multi-response regression 标题:后网:多响应回归中的概率预测 链接:https://arxiv.org/abs/2112.03377

作者:Marius Hofert,Avinash Prasad,Mu Zhu 机构:-,- 摘要:介绍了一种在多响应回归问题中进行概率预测的全非参数方法。随机森林被用作每个响应变量的边际模型,并且,作为本研究的新贡献,多个响应变量之间的依赖性由生成性神经网络建模。这种随机森林、相应的经验边际残差分布和生成型神经网络的组合建模方法称为RafterNet。多个数据集用作示例,以证明该方法的灵活性及其对概率预测的影响。 摘要:A fully nonparametric approach for making probabilistic predictions in multi-response regression problems is introduced. Random forests are used as marginal models for each response variable and, as novel contribution of the present work, the dependence between the multiple response variables is modeled by a generative neural network. This combined modeling approach of random forests, corresponding empirical marginal residual distributions and a generative neural network is referred to as RafterNet. Multiple datasets serve as examples to demonstrate the flexibility of the approach and its impact for making probabilistic forecasts.

【20】 Cadence: A Practical Time-series Partitioning Algorithm for Unlabeled IoT Sensor Streams 标题:Cadence:一种实用的无标签物联网传感器流时间序列划分算法 链接:https://arxiv.org/abs/2112.03360

作者:Tahiya Chowdhury,Murtadha Aldeer,Shantanu Laghate,Jorge Ortiz 机构: Rutgers University 备注:27 pages, 13 figures 摘要:在大多数机器学习驱动、基于传感器的物联网应用中,时间序列划分是一个必不可少的步骤。本文介绍了一种样本高效、鲁棒的时间序列分割模型和算法。我们表明,通过学习一种基于最大平均差异(MMD)的分割目标的表示,我们的算法能够在不同的应用中鲁棒地检测时间序列事件。我们的损失函数允许我们推断样本的连续序列是否来自同一分布(零假设),并确定拒绝零假设的配对之间的变化点(即,来自不同的分布)。我们展示了它在实际物联网部署中的适用性,用于基于环境感知的活动识别。此外,虽然文献中有许多关于变化点检测的工作,但我们的模型非常简单,与最先进的方法相匹配或优于最先进的方法。我们可以在平均9-93秒的时间内完全训练我们的模型,而不同应用程序中数据的超参数变化很小。 摘要:Timeseries partitioning is an essential step in most machine-learning driven, sensor-based IoT applications. This paper introduces a sample-efficient, robust, time-series segmentation model and algorithm. We show that by learning a representation specifically with the segmentation objective based on maximum mean discrepancy (MMD), our algorithm can robustly detect time-series events across different applications. Our loss function allows us to infer whether consecutive sequences of samples are drawn from the same distribution (null hypothesis) and determines the change-point between pairs that reject the null hypothesis (i.e., come from different distributions). We demonstrate its applicability in a real-world IoT deployment for ambient-sensing based activity recognition. Moreover, while many works on change-point detection exist in the literature, our model is significantly simpler and matches or outperforms state-of-the-art methods. We can fully train our model in 9-93 seconds on average with little variation in hyperparameters for data across different applications.

【21】 Bless and curse of smoothness and phase transitions in nonparametric regressions: a nonasymptotic perspective 标题:非参数回归中光滑性和相变的祝福与诅咒:一个非渐近的观点 链接:https://arxiv.org/abs/2112.03626

作者:Ying Zhu 备注:3 Tables 摘要:当回归函数属于由一元函数组成的标准光滑类,其导数高达$(gamma+1)$阶,处处由一个公共常数或a.e.限定时,众所周知,均方误差(MSE)的最小最大最优收敛速度为$左(frac{sigma^{2}{n}右)^{frac{2gamma 2}{2gamma+3}}$当$gamma$是有限的且样本大小为$n ightarrowinfty$。从考虑有限$n$的非共观性观点来看,本文表明:对于标准H'older和Sobolev类,当$frac{sigma^{2}左(gammavee1 ight)}{n}$^{2gamma+3}$和$left($frac{sigma^{2}}{n} ight)^{frac{2gamma+2}{2gamma+3}$当$frac{n}{sigma^{2}}succim left(gamma vee1 ight)^{2gamma 3}$时。为了建立这些结果,我们推导了广义H'older类的覆盖数和填充数的上下界,其中,$k=k,$导数由参数$R{k}$从上方限定,$gamma$th导数为$R{gamma+1}-$Lipschitz(也适用于光滑函数的广义椭球类)。我们的界锐化了标准类的经典度量熵结果,并给出了对$gamma$和$R{k}的一般依赖性$。通过推导$R{k}=1$、$R{k}leq left(k-1 right)!$和$R{k}=k!$(后面两种情况在我们的介绍中有动机)下的极小极大最优MSE率,借助我们的新熵界,我们展示了文献中现有熵界无法显示的两个有趣结果“一类较老的$d-$变量函数,我们的结果表明经典的渐近速率$left(frac{sigma^{2}}{n} ight)^{frac{2gamma+2}{2gamma+2+d}$可能低估了有限样本中的均方误差。 摘要:When the regression function belongs to the standard smooth classes consisting of univariate functions with derivatives up to the $(gamma+1)$th order bounded by a common constant everywhere or a.e., it is well known that the minimax optimal rate of convergence in mean squared error (MSE) is $left(frac{sigma^{2}}{n} ight)^{frac{2gamma+2}{2gamma+3}}$ when $gamma$ is finite and the sample size $n ightarrowinfty$. From a nonasymptotic viewpoint that considers finite $n$, this paper shows that: for the standard H"older and Sobolev classes, the minimax optimal rate is $frac{sigma^{2}left(gammavee1 ight)}{n}$ when $frac{n}{sigma^{2}}precsimleft(gammavee1 ight)^{2gamma+3}$ and $left(frac{sigma^{2}}{n} ight)^{frac{2gamma+2}{2gamma+3}}$ when $frac{n}{sigma^{2}}succsimleft(gammavee1 ight)^{2gamma+3}$. To establish these results, we derive upper and lower bounds on the covering and packing numbers for the generalized H"older class where the $k$th ($k=0,...,gamma$) derivative is bounded from above by a parameter $R_{k}$ and the $gamma$th derivative is $R_{gamma+1}-$Lipschitz (and also for the generalized ellipsoid class of smooth functions). Our bounds sharpen the classical metric entropy results for the standard classes, and give the general dependence on $gamma$ and $R_{k}$. By deriving the minimax optimal MSE rates under $R_{k}=1$, $R_{k}leqleft(k-1 ight)!$ and $R_{k}=k!$ (with the latter two cases motivated in our introduction) with the help of our new entropy bounds, we show a couple of interesting results that cannot be shown with the existing entropy bounds in the literature. For the H"older class of $d-$variate functions, our result suggests that the classical asymptotic rate $left(frac{sigma^{2}}{n} ight)^{frac{2gamma+2}{2gamma+2+d}}$ could be an underestimate of the MSE in finite samples.

【22】 Hybrid guiding: A multi-resolution refinement approach for semantic segmentation of gigapixel histopathological images 标题:混合引导:一种用于千兆像素组织病理图像语义分割的多分辨率细化方法 链接:https://arxiv.org/abs/2112.03455

作者:André Pedersen,Erik Smistad,Tor V. Rise,Vibeke G. Dale,Henrik S. Pettersen,Tor-Arne S. Nordmo,David Bouget,Ingerid Reinertsen,Marit Valla 机构:Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, NO-, Trondheim, Norway, Clinic of Surgery, St. Olavs Hospital, Trondheim University Hospital, NO-, Trondheim, Norway 备注:12 pages, 3 figures 摘要:组织病理学癌症诊断变得更加复杂,越来越多的活检对大多数病理学实验室来说是一个挑战。因此,开发用于评估组织病理学癌症切片的自动方法将是有价值的。在这项研究中,我们使用了来自挪威队列的624张乳腺癌全幻灯片图像(WSIs)。我们提出了一种级联卷积神经网络设计,称为H2G网络,用于千兆像素组织病理学图像的语义分割。该设计包括使用分片方法的检测阶段和使用卷积自动编码器的细化阶段。为了验证设计,我们进行了一项消融研究,以评估管道中所选成分对肿瘤分割的影响。在分割组织病理学图像时,使用分层采样和深度热图细化的引导分割被证明是有益的。我们发现,当使用细化网络对生成的肿瘤分割热图进行后处理时,有了显著的改进。在90个WSI的独立测试集上,总体最佳设计的骰子得分为0.933。该设计优于单分辨率方法,例如使用MobileNetV2(0.872)和低分辨率U-Net(0.874)的群集引导、面片式高分辨率分类。此外,仅使用CPU,在典型的x400 WSI上进行分段大约需要58秒。这些发现证明了利用细化网络改进面片预测的潜力。该解决方案是有效的,并且不需要重叠的面片推断或加密。此外,我们还表明,深度神经网络可以使用随机抽样方案进行训练,该方案可以同时在多个不同的标签上进行平衡,而无需在磁盘上存储补丁。未来的工作应该包括更有效的补丁生成和采样,以及改进聚类。 摘要:Histopathological cancer diagnostics has become more complex, and the increasing number of biopsies is a challenge for most pathology laboratories. Thus, development of automatic methods for evaluation of histopathological cancer sections would be of value. In this study, we used 624 whole slide images (WSIs) of breast cancer from a Norwegian cohort. We propose a cascaded convolutional neural network design, called H2G-Net, for semantic segmentation of gigapixel histopathological images. The design involves a detection stage using a patch-wise method, and a refinement stage using a convolutional autoencoder. To validate the design, we conducted an ablation study to assess the impact of selected components in the pipeline on tumour segmentation. Guiding segmentation, using hierarchical sampling and deep heatmap refinement, proved to be beneficial when segmenting the histopathological images. We found a significant improvement when using a refinement network for postprocessing the generated tumour segmentation heatmaps. The overall best design achieved a Dice score of 0.933 on an independent test set of 90 WSIs. The design outperformed single-resolution approaches, such as cluster-guided, patch-wise high-resolution classification using MobileNetV2 (0.872) and a low-resolution U-Net (0.874). In addition, segmentation on a representative x400 WSI took ~58 seconds, using only the CPU. The findings demonstrate the potential of utilizing a refinement network to improve patch-wise predictions. The solution is efficient and does not require overlapping patch inference or ensembling. Furthermore, we showed that deep neural networks can be trained using a random sampling scheme that balances on multiple different labels simultaneously, without the need of storing patches on disk. Future work should involve more efficient patch generation and sampling, as well as improved clustering.

机器翻译,仅供参考