您现在的位置是：首页 > IT要闻

当前栏目

机器学习学术速递[12.17]

学习数据

2023-04-18 15:34:54 时间

cs.LG 方向，今日共计125篇

Graph相关(图学习|图神经网络|图优化等)(14篇)

【1】 Progressive Graph Convolution Network for EEG Emotion Recognition 标题：渐进图卷积网络在脑电情感识别中的应用链接：https://arxiv.org/abs/2112.09069

作者：Yijin Zhou,Fu Li,Yang Li,Youshuo Ji,Guangming Shi,Wenming Zheng,Lijian Zhang,Yuanfang Chen,Rui Cheng 备注：11 pages, 5 figures 摘要：神经科学领域的研究揭示了情绪模式与大脑功能区域之间的关系，表明不同大脑区域之间的动态关系是影响通过脑电图（EEG）确定的情绪识别的关键因素。此外，在脑电情感识别中，我们可以观察到，基于相同的脑电数据，粗粒度情感之间比细粒度情感之间存在更清晰的边界；这表明大的粗粒度和小的细粒度情感变化同时存在。因此，从粗粒度到细粒度的渐进分类过程可能有助于脑电情感识别。因此，在本研究中，我们提出了一种渐进图卷积网络（PGCN），用于捕捉EEG情绪信号中的这一固有特征，并逐步学习区分性EEG特征。为了适应不同的脑电模式，我们构建了一个双图模块来描述不同脑电通道之间的内在关系，包含了神经科学研究中大脑区域的动态功能连接和静态空间接近信息。此外，出于对粗粒度和细粒度情绪之间关系的观察，我们采用了一个双头模块，该模块使PGCN能够逐步学习更多区分性EEG特征，从粗粒度（容易）到细粒度类别（困难），参考情绪的层次特征。为了验证我们模型的性能，在两个公共数据集：SEED-IV和多模态生理情绪数据库（MPED）上进行了大量实验。摘要：Studies in the area of neuroscience have revealed the relationship between emotional patterns and brain functional regions, demonstrating that dynamic relationships between different brain regions are an essential factor affecting emotion recognition determined through electroencephalography (EEG). Moreover, in EEG emotion recognition, we can observe that clearer boundaries exist between coarse-grained emotions than those between fine-grained emotions, based on the same EEG data; this indicates the concurrence of large coarse- and small fine-grained emotion variations. Thus, the progressive classification process from coarse- to fine-grained categories may be helpful for EEG emotion recognition. Consequently, in this study, we propose a progressive graph convolution network (PGCN) for capturing this inherent characteristic in EEG emotional signals and progressively learning the discriminative EEG features. To fit different EEG patterns, we constructed a dual-graph module to characterize the intrinsic relationship between different EEG channels, containing the dynamic functional connections and static spatial proximity information of brain regions from neuroscience research. Moreover, motivated by the observation of the relationship between coarse- and fine-grained emotions, we adopt a dual-head module that enables the PGCN to progressively learn more discriminative EEG features, from coarse-grained (easy) to fine-grained categories (difficult), referring to the hierarchical characteristic of emotion. To verify the performance of our model, extensive experiments were conducted on two public datasets: SEED-IV and multi-modal physiological emotion database (MPED).

【2】 Hierarchical Clustering: O(1)-Approximation for Well-Clustered Graphs标题：层次聚类：良好聚类图的O(1)-近似链接：https://arxiv.org/abs/2112.09055

作者：Bogdan-Adrian Manghiuc,He Sun 备注：This work appeared at the 35th Conference on Neural Information Processing Systems (NeurIPS'21) 摘要：层次聚类研究将数据集递归地划分为大小依次较小的聚类，是数据分析中的一个基本问题。在这项工作中，我们研究了Dasgupta引入的层次聚类的代价函数，并提出了两种多项式时间近似算法：我们的第一个结果是高电导图的$O（1）$-近似算法。我们的简单构造绕过了文献中已知的查找稀疏割集的复杂递归例程。我们的第二个也是主要的结果是一个$O（1）$-近似算法，用于展示一个定义良好的簇结构的广泛图族。这个结果推广了以前的最新技术，它只适用于由随机模型生成的图。通过对合成数据集和真实数据集的实证分析，证明了我们工作的重要性。对于具有定义良好的聚类结构的图，我们提出的算法在这两个方面都优于先前提出的算法。摘要：Hierarchical clustering studies a recursive partition of a data set into clusters of successively smaller size, and is a fundamental problem in data analysis. In this work we study the cost function for hierarchical clustering introduced by Dasgupta, and present two polynomial-time approximation algorithms: Our first result is an $O(1)$-approximation algorithm for graphs of high conductance. Our simple construction bypasses complicated recursive routines of finding sparse cuts known in the literature. Our second and main result is an $O(1)$-approximation algorithm for a wide family of graphs that exhibit a well-defined structure of clusters. This result generalises the previous state-of-the-art, which holds only for graphs generated from stochastic models. The significance of our work is demonstrated by the empirical analysis on both synthetic and real-world data sets, on which our presented algorithm outperforms the previously proposed algorithm for graphs with a well-defined cluster structure.

【3】 A Heterogeneous Graph Learning Model for Cyber-Attack Detection 标题：一种用于网络攻击检测的异构图学习模型链接：https://arxiv.org/abs/2112.08986

作者：Mingqi Lv,Chengyu Dong,Tieming Chen,Tiantian Zhu,Qijie Song,Yuan Fan 备注：12pages,7figures,40 references 摘要：网络攻击是有经验的黑客恶意破坏目标信息系统的行为。通常，网络攻击的特点是战术、技术和程序的混合以及长期的对抗行为，使得传统的入侵检测方法无效。大多数现有的网络攻击检测系统都是基于手动设计的规则，通过参考领域知识（例如，威胁模型、威胁智能）来实现的。然而，这一过程缺乏智能性和泛化能力。针对这一局限性，本文提出了一种基于出处数据的智能网络攻击检测方法。为了有效地检测来源数据中大量系统事件的网络攻击，我们首先通过异构图对来源数据进行建模，以获取每个系统实体（如进程、文件、套接字等）丰富的上下文信息，并学习每个系统实体的语义向量表示。然后，我们通过从异构图中抽取一个小而紧凑的局部图，并将关键系统实体分类为恶意或良性，来执行在线网络攻击检测。我们在两个真实网络攻击源数据集上进行了一系列实验。实验结果表明，该方法的性能优于其他基于学习的检测模型，与现有的基于规则的网络攻击检测系统相比具有一定的竞争力。摘要：A cyber-attack is a malicious attempt by experienced hackers to breach the target information system. Usually, the cyber-attacks are characterized as hybrid TTPs (Tactics, Techniques, and Procedures) and long-term adversarial behaviors, making the traditional intrusion detection methods ineffective. Most existing cyber-attack detection systems are implemented based on manually designed rules by referring to domain knowledge (e.g., threat models, threat intelligences). However, this process is lack of intelligence and generalization ability. Aiming at this limitation, this paper proposes an intelligent cyber-attack detection method based on provenance data. To effective and efficient detect cyber-attacks from a huge number of system events in the provenance data, we firstly model the provenance data by a heterogeneous graph to capture the rich context information of each system entities (e.g., process, file, socket, etc.), and learns a semantic vector representation for each system entity. Then, we perform online cyber-attack detection by sampling a small and compact local graph from the heterogeneous graph, and classifying the key system entities as malicious or benign. We conducted a series of experiments on two provenance datasets with real cyber-attacks. The experiment results show that the proposed method outperforms other learning based detection models, and has competitive performance against state-of-the-art rule based cyber-attack detection systems.

【4】 Graph Structure Learning with Variational Information Bottleneck 标题：具有变化信息瓶颈的图结构学习链接：https://arxiv.org/abs/2112.08903

作者：Qingyun Sun,Jianxin Li,Hao Peng,Jia Wu,Xingcheng Fu,Cheng Ji,Philip S. Yu 备注：Accepted by AAAI 2022, Preprint version with Appendix 摘要：图形神经网络（GNNs）在广泛的应用中显示了良好的结果。大多数GNNs的实证研究直接将观测图作为输入，假设观测结构完美地描述了节点之间精确而完整的关系。然而，现实世界中的图形不可避免地存在噪声或不完整，这甚至可能加剧图形表示的质量。在这项工作中，我们从信息论的角度提出了一种新的变分信息瓶颈引导图结构学习框架VIB-GSL。VIB-GSL提出了图形结构学习的信息瓶颈（IB）原则，为挖掘底层任务相关关系提供了一个更加优雅和通用的框架。VIB-GSL学习信息丰富的压缩图结构，以提取特定下游任务的可操作信息。VIB-GSL推导了不规则图形数据的变分近似，形成了易于处理的IB目标函数，有利于训练的稳定性。大量实验结果表明，VIB-GSL具有优越的有效性和鲁棒性。摘要：Graph Neural Networks (GNNs) have shown promising results on a broad spectrum of applications. Most empirical studies of GNNs directly take the observed graph as input, assuming the observed structure perfectly depicts the accurate and complete relations between nodes. However, graphs in the real world are inevitably noisy or incomplete, which could even exacerbate the quality of graph representations. In this work, we propose a novel Variational Information Bottleneck guided Graph Structure Learning framework, namely VIB-GSL, in the perspective of information theory. VIB-GSL advances the Information Bottleneck (IB) principle for graph structure learning, providing a more elegant and universal framework for mining underlying task-relevant relations. VIB-GSL learns an informative and compressive graph structure to distill the actionable information for specific downstream tasks. VIB-GSL deduces a variational approximation for irregular graph data to form a tractable IB objective function, which facilitates training stability. Extensive experimental results demonstrate that the superior effectiveness and robustness of VIB-GSL.

【5】 Graph-wise Common Latent Factor Extraction for Unsupervised Graph Representation Learning 标题：用于无监督图表示学习的图式公共潜在因子提取链接：https://arxiv.org/abs/2112.08830

作者：Thilini Cooray,Ngai-Man Cheung 备注：Accepted to AAAI 2022 摘要：无监督图级表示学习在分子性质预测和群体分析等各种任务中起着至关重要的作用，尤其是在数据注释费用昂贵的情况下。目前，大多数性能最好的图嵌入方法都是基于Infomax原理的。这些方法的性能在很大程度上取决于阴性样本的选择，如果不仔细选择样本，则会损害性能。如果用于相似性匹配的选定图集质量较低，则基于图间相似性的方法也会受到影响。为了解决这个问题，我们只关注利用当前输入图进行嵌入学习。我们的动机来自于对真实世界图形生成过程的观察，其中图形是基于一个或多个全局因素形成的，这些全局因素对图形的所有元素都是通用的（例如，讨论主题、分子的溶解度水平）。我们假设提取这些共同因素可能非常有益。因此，本文提出了一种新的无监督图表示学习原理：图态公共潜在因子提取（GCFX）。我们进一步提出了一个GCFX的深层模型deepGCFX，该模型基于逆转上述图形生成过程的思想，该过程可以明确地从输入图形中提取常见的潜在因素，并在下游任务上达到目前最先进的水平。通过大量的实验和分析，我们证明，虽然提取公共潜在因素有助于图形级任务减轻因单个节点或局部邻域的局部变化而引起的分心，但它也有助于节点级任务实现远程节点依赖，特别是对于非分解图。摘要：Unsupervised graph-level representation learning plays a crucial role in a variety of tasks such as molecular property prediction and community analysis, especially when data annotation is expensive. Currently, most of the best-performing graph embedding methods are based on Infomax principle. The performance of these methods highly depends on the selection of negative samples and hurt the performance, if the samples were not carefully selected. Inter-graph similarity-based methods also suffer if the selected set of graphs for similarity matching is low in quality. To address this, we focus only on utilizing the current input graph for embedding learning. We are motivated by an observation from real-world graph generation processes where the graphs are formed based on one or more global factors which are common to all elements of the graph (e.g., topic of a discussion thread, solubility level of a molecule). We hypothesize extracting these common factors could be highly beneficial. Hence, this work proposes a new principle for unsupervised graph representation learning: Graph-wise Common latent Factor EXtraction (GCFX). We further propose a deep model for GCFX, deepGCFX, based on the idea of reversing the above-mentioned graph generation process which could explicitly extract common latent factors from an input graph and achieve improved results on downstream tasks to the current state-of-the-art. Through extensive experiments and analysis, we demonstrate that, while extracting common latent factors is beneficial for graph-level tasks to alleviate distractions caused by local variations of individual nodes or local neighbourhoods, it also benefits node-level tasks by enabling long-range node dependencies, especially for disassortative graphs.

【6】 Graph Convolutional Networks with Dual Message Passing for Subgraph Isomorphism Counting and Matching 标题：用于子图同构计数和匹配的具有双重消息传递的图卷积网络链接：https://arxiv.org/abs/2112.08764

作者：Xin Liu,Yangqiu Song 备注：Accepted by AAAI 2022 摘要：图神经网络（GNNs）和消息传递神经网络（MPNN）已被证明在许多应用中对子图结构具有表达能力。异构图中的一些应用需要显式的边建模，如子图同构计数和匹配。然而，现有的消息传递机制在理论上没有得到很好的设计。本文从一个特殊的边到顶点变换出发，利用边到顶点对偶图的同构性质。证明了在原图上搜索同构等价于在对偶图上搜索同构。基于这一观察结果，我们提出了双消息传递神经网络（DMPNN），以异步方式增强子图同构计数和匹配以及无监督节点分类的子结构表示学习。大量实验表明，在合成图和真实异构图中结合节点和边表示学习，DMPNNs具有鲁棒性。代码可在https://github.com/HKUST-KnowComp/DualMessagePassing. 摘要：Graph neural networks (GNNs) and message passing neural networks (MPNNs) have been proven to be expressive for subgraph structures in many applications. Some applications in heterogeneous graphs require explicit edge modeling, such as subgraph isomorphism counting and matching. However, existing message passing mechanisms are not designed well in theory. In this paper, we start from a particular edge-to-vertex transform and exploit the isomorphism property in the edge-to-vertex dual graphs. We prove that searching isomorphisms on the original graph is equivalent to searching on its dual graph. Based on this observation, we propose dual message passing neural networks (DMPNNs) to enhance the substructure representation learning in an asynchronous way for subgraph isomorphism counting and matching as well as unsupervised node classification. Extensive experiments demonstrate the robust performance of DMPNNs by combining both node and edge representation learning in synthetic and real heterogeneous graphs. Code is available at https://github.com/HKUST-KnowComp/DualMessagePassing.

【7】 Self-Supervised Dynamic Graph Representation Learning via Temporal Subgraph Contrast 标题：基于时态子图对比的自监督动态图表示学习链接：https://arxiv.org/abs/2112.08733

作者：Linpu Jiang,Ke-Jia Chen,Jingqiang Chen 摘要：图上的自监督学习由于其独立于标签和在表示上的鲁棒性，近年来受到了广泛的关注。目前对该主题的研究主要使用静态信息，如图结构，但不能很好地捕获动态信息，如边的时间戳。现实图形通常是动态的，这意味着节点之间的交互发生在特定的时间。提出了一种自监督动态图表示学习框架（DySubC），该框架定义了一个时态子图对比学习任务来同时学习动态图的结构和演化特征。具体而言，本文首先提出了一种新的时态子图采样策略，该策略以动态图的每个节点为中心节点，利用邻域结构和边缘时间戳对相应的时态子图进行采样。对每个子图中的节点进行编码后，根据邻域节点对中心节点的影响设计子图表示函数。最后，定义了结构和时间对比损失，以最大化节点表示和时间子图表示之间的互信息。在五个真实数据集上的实验表明：（1）DySubC在下游链路预测任务中的表现优于相关基线，包括两个图形对比学习模型和四个动态图形表示学习模型；（2）时间信息的使用不仅可以采样更有效的子图，但也要通过时间对比损失学习更好的表征。摘要：Self-supervised learning on graphs has recently drawn a lot of attention due to its independence from labels and its robustness in representation. Current studies on this topic mainly use static information such as graph structures but cannot well capture dynamic information such as timestamps of edges. Realistic graphs are often dynamic, which means the interaction between nodes occurs at a specific time. This paper proposes a self-supervised dynamic graph representation learning framework (DySubC), which defines a temporal subgraph contrastive learning task to simultaneously learn the structural and evolutional features of a dynamic graph. Specifically, a novel temporal subgraph sampling strategy is firstly proposed, which takes each node of the dynamic graph as the central node and uses both neighborhood structures and edge timestamps to sample the corresponding temporal subgraph. The subgraph representation function is then designed according to the influence of neighborhood nodes on the central node after encoding the nodes in each subgraph. Finally, the structural and temporal contrastive loss are defined to maximize the mutual information between node representation and temporal subgraph representation. Experiments on five real-world datasets demonstrate that (1) DySubC performs better than the related baselines including two graph contrastive learning models and four dynamic graph representation learning models in the downstream link prediction task, and (2) the use of temporal information can not only sample more effective subgraphs, but also learn better representation by temporal contrastive loss.

【8】 SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning 标题：SGEITL：用于视觉常识推理的场景图增强图文学习链接：https://arxiv.org/abs/2112.08587

作者：Zhecan Wang,Haoxuan You,Liunian Harold Li,Alireza Zareian,Suji Park,Yiqing Liang,Kai-Wei Chang,Shih-Fu Chang 备注：None 摘要：回答关于图像的复杂问题是机器智能的一个雄心勃勃的目标，它需要对图像、文本和常识的共同理解，以及强大的推理能力。近年来，多模态变换器在视觉常识推理（VCR）方面取得了巨大进展，它通过跨模态注意层共同理解视觉对象和文本标记。然而，这些方法并没有利用场景的丰富结构和对象之间的交互作用，这对于回答复杂的常识性问题至关重要。我们提出了一个场景图增强图像文本学习（SGEITL）框架，将视觉场景图融入常识推理。为了利用场景图结构，在模型结构层次上，我们提出了一种多跳图变换器，用于正则化跳之间的注意交互。在预训练方面，提出了一种场景图感知的预训练方法，以利用从视觉场景图中提取的结构知识。此外，我们还介绍了一种在弱监督的情况下使用文本注释来训练和生成与领域相关的视觉场景图的方法。在VCR和其他任务上进行的大量实验表明，与最先进的方法相比，性能显著提高，并证明了每个拟议组件的有效性。摘要：Answering complex questions about images is an ambitious goal for machine intelligence, which requires a joint understanding of images, text, and commonsense knowledge, as well as a strong reasoning ability. Recently, multimodal Transformers have made great progress in the task of Visual Commonsense Reasoning (VCR), by jointly understanding visual objects and text tokens through layers of cross-modality attention. However, these approaches do not utilize the rich structure of the scene and the interactions between objects which are essential in answering complex commonsense questions. We propose a Scene Graph Enhanced Image-Text Learning (SGEITL) framework to incorporate visual scene graphs in commonsense reasoning. To exploit the scene graph structure, at the model structure level, we propose a multihop graph transformer for regularizing attention interaction among hops. As for pre-training, a scene-graph-aware pre-training method is proposed to leverage structure knowledge extracted in the visual scene graph. Moreover, we introduce a method to train and generate domain-relevant visual scene graphs using textual annotations in a weakly-supervised manner. Extensive experiments on VCR and other tasks show a significant performance boost compared with the state-of-the-art methods and prove the efficacy of each proposed component.

【9】 HampDTI: a heterogeneous graph automatic meta-path learning method for drug-target interaction prediction 标题：HampDTI：一种用于药物-靶相互作用预测的异构图自动元路径学习方法链接：https://arxiv.org/abs/2112.08567

作者：Hongzhun Wang,Feng Huang,Wen Zhang 备注：9 pages, 4 figures 摘要：动机：确定药物-靶点相互作用（DTI）是药物重新定位的关键步骤。近年来，大量基因组学和药理学数据的积累形成了大量药物和靶点相关异质网络（HNs），这为开发基于HN的计算模型以准确预测DTI提供了新的机会。HN包含大量关于DTI的有用信息，但也包含不相关的数据，如何充分利用异构网络仍然是一个挑战。结果：本文提出了一种基于异构图元路径学习的DTI预测方法（HampDTI）。HampDTI自动从HN中学习药物和靶点之间的重要元路径，并生成元路径图。对于每个元路径图，从药物分子图和靶蛋白序列中学习到的特征作为节点属性，然后设计一个节点类型特定图卷积网络（NSGCN），有效地考虑节点类型信息（药物或靶点），以学习药物和靶点的嵌入。最后，结合来自多个元路径图的嵌入来预测新的DTI。在基准数据集上的实验表明，与最先进的DTI预测方法相比，我们提出的HampDTI具有更高的性能。更重要的是，HampDTI确定了DTI预测的重要元途径，这可以解释药物如何与HNs中的靶点连接。摘要：Motivation: Identifying drug-target interactions (DTIs) is a key step in drug repositioning. In recent years, the accumulation of a large number of genomics and pharmacology data has formed mass drug and target related heterogeneous networks (HNs), which provides new opportunities of developing HN-based computational models to accurately predict DTIs. The HN implies lots of useful information about DTIs but also contains irrelevant data, and how to make the best of heterogeneous networks remains a challenge. Results: In this paper, we propose a heterogeneous graph automatic meta-path learning based DTI prediction method (HampDTI). HampDTI automatically learns the important meta-paths between drugs and targets from the HN, and generates meta-path graphs. For each meta-path graph, the features learned from drug molecule graphs and target protein sequences serve as the node attributes, and then a node-type specific graph convolutional network (NSGCN) which efficiently considers node type information (drugs or targets) is designed to learn embeddings of drugs and targets. Finally, the embeddings from multiple meta-path graphs are combined to predict novel DTIs. The experiments on benchmark datasets show that our proposed HampDTI achieves superior performance compared with state-of-the-art DTI prediction methods. More importantly, HampDTI identifies the important meta-paths for DTI prediction, which could explain how drugs connect with targets in HNs.

【10】 BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing 标题：BGL：优化图形数据I/O和预处理的GPU高效GNN训练链接：https://arxiv.org/abs/2112.08541

作者：Tianfeng Liu,Yangrui Chen,Dan Li,Chuan Wu,Yibo Zhu,Jun He,Yanghua Peng,Hongzheng Chen,Hongzhi Chen,Chuanxiong Guo 备注：Under Review 摘要：图形神经网络（GNNs）将深度神经网络（DNNs）的成功扩展到非欧几里德图形数据，在节点分类和图形属性预测等各种任务上取得了突破性的性能。尽管如此，现有系统在使用GPU训练具有数十亿个节点和边的大型图形时效率低下。主要的瓶颈是为GPU准备数据的过程-子图采样和特征检索。本文提出了一个分布式GNN训练系统BGL，它旨在解决瓶颈问题，并提出了一些关键思想。首先，我们提出一个动态缓存引擎来最小化特征检索流量。通过缓存策略和采样顺序的联合设计，我们找到了一个低开销和高缓存命中率的最佳点。其次，我们改进了图划分算法，以减少子图采样过程中的跨划分通信。最后，仔细的资源隔离减少了不同数据预处理阶段之间的争用。在各种GNN模型和大型图形数据集上进行的大量实验表明，BGL显著优于现有GNN训练系统，平均提高20.68倍。摘要：Graph neural networks (GNNs) have extended the success of deep neural networks (DNNs) to non-Euclidean graph data, achieving ground-breaking performance on various tasks such as node classification and graph property prediction. Nonetheless, existing systems are inefficient to train large graphs with billions of nodes and edges with GPUs. The main bottlenecks are the process of preparing data for GPUs - subgraph sampling and feature retrieving. This paper proposes BGL, a distributed GNN training system designed to address the bottlenecks with a few key ideas. First, we propose a dynamic cache engine to minimize feature retrieving traffic. By a co-design of caching policy and the order of sampling, we find a sweet spot of low overhead and high cache hit ratio. Second, we improve the graph partition algorithm to reduce cross-partition communication during subgraph sampling. Finally, careful resource isolation reduces contention between different data preprocessing stages. Extensive experiments on various GNN models and large graph datasets show that BGL significantly outperforms existing GNN training systems by 20.68x on average.

【11】 Lifelong Generative Modelling Using Dynamic Expansion Graph Model 标题：基于动态扩展图模型的终身创成式建模链接：https://arxiv.org/abs/2112.08370

作者：Fei Ye,Adrian G. Bors 备注：Accepted in Proceedings of the 36th AAAI Conference on Artificial Intelligence (AAAI 2022) 摘要：变分自动编码器（VAE）在学习多个连续任务时，性能会退化。这是由灾难性的遗忘造成的。为了解决知识丢失问题，虚拟企业正在使用生成重放（GR）机制或扩展网络体系结构（ENA）。在本文中，我们使用GR和ENA联合方法，通过推导负边际对数似然上界来研究VAEs的遗忘行为。这一理论分析为VAEs如何在终身学习中忘记先前学到的知识提供了新的见解。分析表明，在ENA框架下，当考虑模型混合物时，在没有组件数量限制的情况下，达到了最佳性能。然而，基于ENA的方法可能需要过多的参数。这促使我们提出了一种新的动态扩展图模型（DEGM）。DEGM根据与每个新数据库相关的新颖性，与网络从以前的任务中已经学习到的信息相比较，扩展了其体系结构。DEGM训练优化了知识结构，描述了与过去和最近学习的任务相对应的联合概率表示。我们证明了DEGM保证了每个任务的最佳性能，同时也最小化了所需的参数数量。补充资料（SM）和源代码可在https://github.com/dtuzi123/Expansion-Graph-Model. 摘要：Variational Autoencoders (VAEs) suffer from degenerated performance, when learning several successive tasks. This is caused by catastrophic forgetting. In order to address the knowledge loss, VAEs are using either Generative Replay (GR) mechanisms or Expanding Network Architectures (ENA). In this paper we study the forgetting behaviour of VAEs using a joint GR and ENA methodology, by deriving an upper bound on the negative marginal log-likelihood. This theoretical analysis provides new insights into how VAEs forget the previously learnt knowledge during lifelong learning. The analysis indicates the best performance achieved when considering model mixtures, under the ENA framework, where there are no restrictions on the number of components. However, an ENA-based approach may require an excessive number of parameters. This motivates us to propose a novel Dynamic Expansion Graph Model (DEGM). DEGM expands its architecture, according to the novelty associated with each new databases, when compared to the information already learnt by the network from previous tasks. DEGM training optimizes knowledge structuring, characterizing the joint probabilistic representations corresponding to the past and more recently learned tasks. We demonstrate that DEGM guarantees optimal performance for each task while also minimizing the required number of parameters. Supplementary materials (SM) and source code are available in https://github.com/dtuzi123/Expansion-Graph-Model.

【12】 Multivariate Realized Volatility Forecasting with Graph Neural Network 标题：基于图神经网络的多变量已实现波动率预测链接：https://arxiv.org/abs/2112.09015

作者：Qinkai Chen,Christian-Yann Robert 备注：13 pages, 6 tables, 4 figures 摘要：现有出版物表明，限价指令簿数据有助于预测股票市场的短期波动。由于股票不是独立的，一只股票的变化也会影响其他相关股票。在本文中，我们感兴趣的是基于限价指令簿数据和关系数据的多变量方法预测短期已实现波动率。为了实现这一目标，我们引入了用于波动率预测的图变换网络。该模型允许组合限制订单簿功能和来自不同来源的无限数量的时间和横截面关系。通过对标准普尔500指数中约500只股票的实验，我们发现我们的模型的性能优于其他基准。摘要：The existing publications demonstrate that the limit order book data is useful in predicting short-term volatility in stock markets. Since stocks are not independent, changes on one stock can also impact other related stocks. In this paper, we are interested in forecasting short-term realized volatility in a multivariate approach based on limit order book data and relational data. To achieve this goal, we introduce Graph Transformer Network for Volatility Forecasting. The model allows to combine limit order book features and an unlimited number of temporal and cross-sectional relations from different sources. Through experiments based on about 500 stocks from S&P 500 index, we find a better performance for our model than for other benchmarks.

【13】 Characterization of causal ancestral graphs for time series with latent confounders 标题：含潜在混杂因素的时间序列因果祖图的刻画链接：https://arxiv.org/abs/2112.08417

作者：Andreas Gerhardus 备注：55 pages (including appendix), 16 figures 摘要：推广有向最大祖先图，我们引入了一类图形模型，用于表示具有未观测变量的多元时间序列的有限多个定期采样和定期次采样时间步之间的时滞特定因果关系和独立性。我们完全描述了这些图，并表明它们包含的约束超出了先前文献中考虑的约束。这允许在没有附加假设的情况下进行更强的因果推断。在有向部分祖先图的推广中，我们进一步介绍了新类型图的马尔可夫等价类的图形表示，并表明它们比当前最先进的因果发现算法所学的知识更丰富。我们还分析了通过增加观察到的时间步数获得的附加信息。摘要：Generalizing directed maximal ancestral graphs, we introduce a class of graphical models for representing time lag specific causal relationships and independencies among finitely many regularly sampled and regularly subsampled time steps of multivariate time series with unobserved variables. We completely characterize these graphs and show that they entail constraints beyond those that have previously been considered in the literature. This allows for stronger causal inferences without having imposed additional assumptions. In generalization of directed partial ancestral graphs we further introduce a graphical representation of Markov equivalence classes of the novel type of graphs and show that these are more informative than what current state-of-the-art causal discovery algorithms learn. We also analyze the additional information gained by increasing the number of observed time steps.

【14】 AGMI: Attention-Guided Multi-omics Integration for Drug Response Prediction with Graph Neural Networks 标题：AGMI：注意力引导的多组学集成图神经网络药物反应预测链接：https://arxiv.org/abs/2112.08366

作者：Feng Ruiwei,Xie Yufeng,Lai Minshan,Chen Danny,Cao Ji,Wu Jian 摘要：准确的药物反应预测（DRP）是精确医学中一项关键而富有挑战性的任务。本文提出了一种新的用于DRP的注意引导多组学整合（AGMI）方法，该方法首先为每个细胞系构建一个多边缘图（MeG），然后使用一种称为图形边缘感知网络（GeNet）的新结构聚集多组学特征以预测药物反应。我们的AGMI方法首次探索了基于基因约束的多组学整合，利用GNNs将DRP与整个基因组进行整合。在CCLE和GDSC数据集上的实证实验表明，我们的AGMI在四个指标上大大优于最先进的DRP方法8.3%-34.2%。我们的数据和代码可在https://github.com/yivan-WYYGDSG/AGMI. 摘要：Accurate drug response prediction (DRP) is a crucial yet challenging task in precision medicine. This paper presents a novel Attention-Guided Multi-omics Integration (AGMI) approach for DRP, which first constructs a Multi-edge Graph (MeG) for each cell line, and then aggregates multi-omics features to predict drug response using a novel structure, called Graph edge-aware Network (GeNet). For the first time, our AGMI approach explores gene constraint based multi-omics integration for DRP with the whole-genome using GNNs. Empirical experiments on the CCLE and GDSC datasets show that our AGMI largely outperforms state-of-the-art DRP methods by 8.3%--34.2% on four metrics. Our data and code are available at https://github.com/yivan-WYYGDSG/AGMI.

Transformer(1篇)

【1】 Trading with the Momentum Transformer: An Intelligent and Interpretable Architecture 标题：与动量转换器交易：一种智能且可解释的架构链接：https://arxiv.org/abs/2112.08534

作者：Kieran Wood,Sven Giegerich,Stephen Roberts,Stefan Zohren 摘要：深度学习体系结构，特别是深度动量网络（DMN）[1904.04912]，已被发现是动量和均值回归交易的有效方法。然而，近年来的一些关键挑战涉及学习长期依赖性、考虑交易成本后的回报时的绩效下降以及适应新的市场制度，特别是在SARS-CoV-2危机期间。注意力机制或基于转换器的架构是解决此类挑战的一种方法，因为它们允许网络关注过去和长期模式中的重要时间步骤。我们引入动量转换器，这是一种基于注意力的架构，其性能优于基准，并且具有内在的可解释性，为我们深入了解我们的深度学习交易策略提供了更深入的见解。我们的模型是基于LSTM的DMN的扩展，该DMN通过根据风险调整的绩效指标（如夏普比率）优化网络，直接输出头寸大小。我们发现注意LSTM混合解码器-纯时间融合转换器（TFT）风格的架构是性能最好的模型。在可解释性方面，我们观察到注意模式的显著结构，在动量转折点具有显著的重要性峰值。因此，时间序列被划分为不同的区域，模型倾向于关注类似区域中的先前时间步骤。我们发现改变点检测（CPD）[2105.13727]，另一种应对体制变化的技术，可以补充多头注意，特别是当我们在多个时间尺度上运行CPD时。通过添加可解释变量选择网络，我们观察了CPD如何帮助我们的模型摆脱以日收益数据为主的交易。我们注意到，该模型可以智能地在经典策略之间切换和混合——基于数据中的模式进行决策。摘要：Deep learning architectures, specifically Deep Momentum Networks (DMNs) [1904.04912], have been found to be an effective approach to momentum and mean-reversion trading. However, some of the key challenges in recent years involve learning long-term dependencies, degradation of performance when considering returns net of transaction costs and adapting to new market regimes, notably during the SARS-CoV-2 crisis. Attention mechanisms, or Transformer-based architectures, are a solution to such challenges because they allow the network to focus on significant time steps in the past and longer-term patterns. We introduce the Momentum Transformer, an attention-based architecture which outperforms the benchmarks, and is inherently interpretable, providing us with greater insights into our deep learning trading strategy. Our model is an extension to the LSTM-based DMN, which directly outputs position sizing by optimising the network on a risk-adjusted performance metric, such as Sharpe ratio. We find an attention-LSTM hybrid Decoder-Only Temporal Fusion Transformer (TFT) style architecture is the best performing model. In terms of interpretability, we observe remarkable structure in the attention patterns, with significant peaks of importance at momentum turning points. The time series is thus segmented into regimes and the model tends to focus on previous time-steps in alike regimes. We find changepoint detection (CPD) [2105.13727], another technique for responding to regime change, can complement multi-headed attention, especially when we run CPD at multiple timescales. Through the addition of an interpretable variable selection network, we observe how CPD helps our model to move away from trading predominantly on daily returns data. We note that the model can intelligently switch between, and blend, classical strategies - basing its decision on patterns in the data.

GAN|对抗|攻击|生成相关(9篇)

【1】 Ensembling Off-the-shelf Models for GAN Training 标题：用于GaN训练的现成模型集成链接：https://arxiv.org/abs/2112.09130

作者：Nupur Kumari,Richard Zhang,Eli Shechtman,Jun-Yan Zhu 备注：GitHub: this https URL Project webpage: this https URL 摘要：大规模训练的出现产生了大量强大的视觉识别模型。然而，生成模型，如GANs，传统上是以无监督的方式从头开始训练的。从大量预先训练的视觉模型中获得的集体“知识”能否被用来改进训练？如果是的话，那么有这么多的模型可供选择，应该选择哪一个，它们以什么方式最有效？我们发现，预训练的计算机视觉模型可以显著提高性能时，用于集成鉴别器。值得注意的是，所选模型的特定子集会极大地影响性能。我们提出了一种有效的选择机制，通过探测预训练模型嵌入中真实和虚假样本之间的线性可分性，选择最精确的模型，并逐步将其添加到鉴别器集合中。有趣的是，我们的方法可以在有限的数据和大规模的环境中改进GAN训练。仅给出10k训练样本，我们在LSUN Cat上的FID与在1.6M图像上训练的StyleGAN2相匹配。在完整的数据集上，我们的方法将LSUN的猫、教堂和马类别的FID提高了1.5倍到2倍。摘要：The advent of large-scale training has produced a cornucopia of powerful visual recognition models. However, generative models, such as GANs, have traditionally been trained from scratch in an unsupervised manner. Can the collective "knowledge" from a large bank of pretrained vision models be leveraged to improve GAN training? If so, with so many models to choose from, which one(s) should be selected, and in what manner are they most effective? We find that pretrained computer vision models can significantly improve performance when used in an ensemble of discriminators. Notably, the particular subset of selected models greatly affects performance. We propose an effective selection mechanism, by probing the linear separability between real and fake samples in pretrained model embeddings, choosing the most accurate model, and progressively adding it to the discriminator ensemble. Interestingly, our method can improve GAN training in both limited data and large-scale settings. Given only 10k training samples, our FID on LSUN Cat matches the StyleGAN2 trained on 1.6M images. On the full dataset, our method improves FID by 1.5x to 2x on cat, church, and horse categories of LSUN.

【2】 Learning and Analyzing Generation Order for Undirected Sequence Models 标题：无向序列模型的生成顺序学习与分析链接：https://arxiv.org/abs/2112.09097

作者：Yichen Jiang,Mohit Bansal 备注：EMNLP 2021 Findings (12 pages) 摘要：无向神经序列模型的性能与机器翻译任务中从左到右单调生成的最先进的有向序列模型相当。在这项工作中，我们训练了一种策略，该策略通过强化学习学习预先训练的无向翻译模型的生成顺序。我们表明，在WMT'14德语-英语翻译任务中，根据我们的学习顺序解码的翻译比从左到右解码的输出或根据Mansimov等人（2019）的学习顺序解码的输出获得更高的BLEU分数。在De-En、WMT'16英罗曼语和WMT'21英汉语翻译任务的最大源和目标长度为30的示例中，我们的学习顺序在六分之四的任务上优于所有启发式生成顺序。接下来，我们通过定性和定量分析仔细分析所学的订单模式。我们表明，我们的政策通常遵循从外部到内部的顺序，首先预测最左侧和最右侧的位置，然后向中间移动，同时在开始时跳过不太重要的单词。此外，该策略通常在连续步骤中预测单个句法成分结构的位置。我们相信我们的发现可以提供更多关于无向生成模型机制的见解，并鼓励在这一方向上进行进一步的研究。我们的代码在https://github.com/jiangycTarheel/undirected-generation 摘要：Undirected neural sequence models have achieved performance competitive with the state-of-the-art directed sequence models that generate monotonically from left to right in machine translation tasks. In this work, we train a policy that learns the generation order for a pre-trained, undirected translation model via reinforcement learning. We show that the translations decoded by our learned orders achieve higher BLEU scores than the outputs decoded from left to right or decoded by the learned order from Mansimov et al. (2019) on the WMT'14 German-English translation task. On examples with a maximum source and target length of 30 from De-En, WMT'16 English-Romanian, and WMT'21 English-Chinese translation tasks, our learned order outperforms all heuristic generation orders on four out of six tasks. We next carefully analyze the learned order patterns via qualitative and quantitative analysis. We show that our policy generally follows an outer-to-inner order, predicting the left-most and right-most positions first, and then moving toward the middle while skipping less important words at the beginning. Furthermore, the policy usually predicts positions for a single syntactic constituent structure in consecutive steps. We believe our findings could provide more insights on the mechanism of undirected generation models and encourage further research in this direction. Our code is publicly available at https://github.com/jiangycTarheel/undirected-generation

【3】 Deep Reinforcement Learning Policies Learn Shared Adversarial Features Across MDPs 标题：深度强化学习策略学习跨MDP的共享对抗性特征链接：https://arxiv.org/abs/2112.09025

作者：Ezgi Korkmaz 备注：Published in AAAI 2022 摘要：深度神经网络作为函数逼近器的使用，在强化学习算法和应用方面取得了显著的进展。然而，我们对决策边界几何学和神经策略的损失情况的了解仍然相当有限。在本文中，我们提出了一个框架来研究各州和MDP之间的决策边界和损失景观相似性。我们在Arcade学习环境中的各种游戏中进行实验，发现神经策略的高灵敏度方向在MDP中是相关的。我们认为，这些高灵敏度方向支持以下假设：强化学习代理的训练环境中共享非稳健特征。我们相信，我们的研究结果揭示了深度强化学习训练环境的基本属性，为构建健壮可靠的深度强化学习代理迈出了切实的一步。摘要：The use of deep neural networks as function approximators has led to striking progress for reinforcement learning algorithms and applications. Yet the knowledge we have on decision boundary geometry and the loss landscape of neural policies is still quite limited. In this paper we propose a framework to investigate the decision boundary and loss landscape similarities across states and across MDPs. We conduct experiments in various games from Arcade Learning Environment, and discover that high sensitivity directions for neural policies are correlated across MDPs. We argue that these high sensitivity directions support the hypothesis that non-robust features are shared across training environments of reinforcement learning agents. We believe our results reveal fundamental properties of the environments used in deep reinforcement learning training, and represent a tangible step towards building robust and reliable deep reinforcement learning agents.

【4】 Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning 标题：从引导性游戏中学习：改进对抗性模仿学习探索性的一种有计划的分层方法链接：https://arxiv.org/abs/2112.08932

作者：Trevor Ablett,Bryan Chan,Jonathan Kelly 备注：Accepted at the Neurips 2021 Deep Reinforcement Learning Workshop, Sydney, Australia 摘要：有效的探索仍然是一个重大的挑战，它阻碍了许多物理系统中强化学习的部署。对于具有连续和高维状态和动作空间的系统，如机器人，尤其如此。这种挑战在稀疏奖励设置中更加突出，在这种设置中，密集奖励设计所需的低级状态信息不可用。对抗性模仿学习（AIL）可以通过利用专家生成的最佳行为演示，并从本质上替代密集的奖励信息，部分克服这一障碍。不幸的是，专家演示的可用性并不一定能提高代理有效探索的能力，正如我们的经验所表明的那样，这可能导致学习效率低下或停滞。我们介绍了引导式游戏学习（LfGP），这是一个框架，在该框架中，除了一个主要任务外，我们还利用专家演示多个辅助任务。随后，使用分层模型通过修改的AIL过程学习每个任务奖励和策略，其中通过将不同任务组合在一起的调度器执行对所有任务的探索。这提供了许多好处：对于具有挑战性瓶颈转换的主要任务，学习效率得到了提高，专家数据在任务之间变得可重用，并且通过重用已学习的辅助任务模型实现转移学习成为可能。我们在一个具有挑战性的多任务机器人操作领域的实验结果表明，我们的方法优于有监督的模仿学习和最先进的AIL方法。代码可在https://github.com/utiasSTARS/lfgp. 摘要：Effective exploration continues to be a significant challenge that prevents the deployment of reinforcement learning for many physical systems. This is particularly true for systems with continuous and high-dimensional state and action spaces, such as robotic manipulators. The challenge is accentuated in the sparse rewards setting, where the low-level state information required for the design of dense rewards is unavailable. Adversarial imitation learning (AIL) can partially overcome this barrier by leveraging expert-generated demonstrations of optimal behaviour and providing, essentially, a replacement for dense reward information. Unfortunately, the availability of expert demonstrations does not necessarily improve an agent's capability to explore effectively and, as we empirically show, can lead to inefficient or stagnated learning. We present Learning from Guided Play (LfGP), a framework in which we leverage expert demonstrations of, in addition to a main task, multiple auxiliary tasks. Subsequently, a hierarchical model is used to learn each task reward and policy through a modified AIL procedure, in which exploration of all tasks is enforced via a scheduler composing different tasks together. This affords many benefits: learning efficiency is improved for main tasks with challenging bottleneck transitions, expert data becomes reusable between tasks, and transfer learning through the reuse of learned auxiliary task models becomes possible. Our experimental results in a challenging multitask robotic manipulation domain indicate that our method compares favourably to supervised imitation learning and to a state-of-the-art AIL method. Code is available at https://github.com/utiasSTARS/lfgp.

【5】 Imbalanced Sample Generation and Evaluation for Power System Transient Stability Using CTGAN 标题：基于CTGAN的电力系统暂态稳定不平衡样本生成与评估链接：https://arxiv.org/abs/2112.08836

作者：Gengshi Han,Shunyu Liu,Kaixuan Chen,Na Yu,Zunlei Feng,Mingli Song 摘要：尽管深度学习在电力系统暂态稳定评估方面取得了令人瞩目的进展，但样本不足和不平衡仍然阻碍了数据驱动方法的训练效果。提出了一种基于条件表生成对抗网络（CTGAN）的可控样本生成框架，用于生成指定的暂态稳定样本。为了拟合暂态稳定样本的复杂特征分布，该框架首先将样本建模为表格数据，然后使用高斯混合模型对表格数据进行归一化。然后，我们将多个条件转换为单个条件向量，以实现多条件生成。此外，本文还引入了三个评估指标来验证基于该框架生成的样本的质量。在ieee39节点系统上的实验结果表明，该框架有效地平衡了暂态稳定样本，显著提高了暂态稳定评估模型的性能。摘要：Although deep learning has achieved impressive advances in transient stability assessment of power systems, the insufficient and imbalanced samples still trap the training effect of the data-driven methods. This paper proposes a controllable sample generation framework based on Conditional Tabular Generative Adversarial Network (CTGAN) to generate specified transient stability samples. To fit the complex feature distribution of the transient stability samples, the proposed framework firstly models the samples as tabular data and uses Gaussian mixture models to normalize the tabular data. Then we transform multiple conditions into a single conditional vector to enable multi-conditional generation. Furthermore, this paper introduces three evaluation metrics to verify the quality of generated samples based on the proposed framework. Experimental results on the IEEE 39-bus system show that the proposed framework effectively balances the transient stability samples and significantly improves the performance of transient stability assessment models.

【6】 Self-supervised Enhancement of Latent Discovery in GANs 标题：GANS中潜在发现的自我监督增强链接：https://arxiv.org/abs/2112.08835

作者：Silpa Vadakkeeveetil Sreelatha,Adarsh Kappiyath,S Sumitra 备注：Accepted to the 36th AAAI Conference on Artificial Intelligence (AAAI 2022) 摘要：已经提出了几种在预先训练的GANs的潜在空间中发现可解释方向的方法。与监督方法相比，无监督方法发现的潜在语义相对较少，因为它们不使用预先训练的属性分类器。我们提出了规模排名估计器（SRE），它是使用自我监督训练。SRE在现有无监督解纠缠技术获得的方向上增强解纠缠。这些方向被更新，以保持潜在空间中每个方向内的变化顺序。对发现方向的定性和定量评估表明，我们提出的方法显著改善了各种数据集中的解纠缠。我们还表明，学习的SRE可以用于执行基于属性的图像检索任务，而无需进一步训练。摘要：Several methods for discovering interpretable directions in the latent space of pre-trained GANs have been proposed. Latent semantics discovered by unsupervised methods are relatively less disentangled than supervised methods since they do not use pre-trained attribute classifiers. We propose Scale Ranking Estimator (SRE), which is trained using self-supervision. SRE enhances the disentanglement in directions obtained by existing unsupervised disentanglement techniques. These directions are updated to preserve the ordering of variation within each direction in latent space. Qualitative and quantitative evaluation of the discovered directions demonstrates that our proposed method significantly improves disentanglement in various datasets. We also show that the learned SRE can be used to perform Attribute-based image retrieval task without further training.

【7】 Dataset correlation inference attacks against machine learning models 标题：针对机器学习模型的数据集关联推理攻击链接：https://arxiv.org/abs/2112.08806

作者：Ana-Maria Creţu,Florent Guépin,Yves-Alexandre de Montjoye 备注：13 pages 摘要：世界各地的企业和组织越来越多地使用机器学习模型来自动化任务和决策。经过潜在敏感数据集的训练，机器学习模型已被证明会泄露数据集中的个人信息以及全局数据集信息。在此，我们进一步研究了数据集属性推断攻击，提出了一种针对ML模型的新攻击：数据集相关性推断攻击，攻击者的目标是推断模型输入变量之间的相关性。我们首先展示了攻击者可以利用相关矩阵的球形参数化进行知情猜测。这意味着，仅使用输入变量和目标变量之间的相关性，攻击者就可以比随机猜测基线更好地推断两个输入变量之间的相关性。我们提出了第二种攻击，该攻击利用对机器学习模型的访问，使用阴影建模来细化猜测。我们的攻击使用基于高斯copula的生成建模来生成具有多种相关性的合成数据集，以便为相关性推理任务训练元模型。我们评估了我们对逻辑回归和多层感知器模型的攻击，并表明它优于无模型攻击。我们的结果表明，第二种基于机器学习的攻击的准确度随着变量数量的增加而降低，并向无模型攻击的准确度收敛。然而，与目标变量高度相关的输入变量之间的相关性更容易受到影响，而与变量数量无关。我们的工作弥合了训练数据集的全局泄漏和个人级泄漏之间的差距。当与边缘泄漏攻击相结合时，它可能也是数据集重建的第一步。摘要：Machine learning models are increasingly used by businesses and organizations around the world to automate tasks and decision-making. Trained on potentially sensitive datasets, machine learning models have been shown to leak information about individuals in the dataset as well as global dataset information. We here take research in dataset property inference attacks one step further by proposing a new attack against ML models: a dataset correlation inference attack, where an attacker's goal is to infer the correlation between input variables of a model. We first show that an attacker can exploit the spherical parametrization of correlation matrices, to make an informed guess. This means that using only the correlation between the input variables and the target variable, an attacker can infer the correlation between two input variables much better than a random guess baseline. We propose a second attack which exploits the access to a machine learning model using shadow modeling to refine the guess. Our attack uses Gaussian copula-based generative modeling to generate synthetic datasets with a wide variety of correlations in order to train a meta-model for the correlation inference task. We evaluate our attack against Logistic Regression and Multi-layer perceptron models and show it to outperform the model-less attack. Our results show that the accuracy of the second, machine learning-based attack decreases with the number of variables and converges towards the accuracy of the model-less attack. However, correlations between input variables which are highly correlated with the target variable are more vulnerable regardless of the number of variables. Our work bridges the gap between what can be considered a global leakage about the training dataset and individual-level leakages. When coupled with marginal leakage attacks,it might also constitute a first step towards dataset reconstruction.

【8】 StyleMC: Multi-Channel Based Fast Text-Guided Image Generation and Manipulation 标题：StyleMC：基于多通道的文本引导图像快速生成与处理链接：https://arxiv.org/abs/2112.08493

作者：Umut Kocasari,Alara Dirik,Mert Tiftikci,Pinar Yanardag 备注：Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2022) 摘要：在GANs的潜在空间中发现有意义的方向以操纵语义属性通常需要大量的标记数据。最近的工作旨在通过利用对比语言图像预训练（CLIP）这一联合文本图像模型来克服这一局限性。虽然这些方法很有希望，但需要几个小时的预处理或训练才能实现所需的操作。在本文中，我们提出了StyleMC，一种快速有效的文本驱动图像生成和处理方法。StyleMC使用基于剪辑的丢失和身份丢失通过单个文本提示操作图像，而不会显著影响其他属性。与以前的工作不同，StyleMC只需要对每个文本提示符进行几秒钟的训练就可以找到稳定的全局方向，不需要进行提示工程，并且可以与任何预先训练过的StyleGAN2模型一起使用。我们展示了我们的方法的有效性，并将其与最先进的方法进行了比较。我们的代码可以在http://catlab-team.github.io/stylemc. 摘要：Discovering meaningful directions in the latent space of GANs to manipulate semantic attributes typically requires large amounts of labeled data. Recent work aims to overcome this limitation by leveraging the power of Contrastive Language-Image Pre-training (CLIP), a joint text-image model. While promising, these methods require several hours of preprocessing or training to achieve the desired manipulations. In this paper, we present StyleMC, a fast and efficient method for text-driven image generation and manipulation. StyleMC uses a CLIP-based loss and an identity loss to manipulate images via a single text prompt without significantly affecting other attributes. Unlike prior work, StyleMC requires only a few seconds of training per text prompt to find stable global directions, does not require prompt engineering and can be used with any pre-trained StyleGAN2 model. We demonstrate the effectiveness of our method and compare it to state-of-the-art methods. Our code can be found at http://catlab-team.github.io/stylemc.

【9】 Positional Encoding Augmented GAN for the Assessment of Wind Flow for Pedestrian Comfort in Urban Areas 标题：位置编码增强型GaN用于城市地区行人舒适性的风流评价链接：https://arxiv.org/abs/2112.08447

作者：Henrik Høiness,Kristoffer Gjerde,Luca Oggiano,Knut Erik Teigen Giljarhus,Massimiliano Ruocco 摘要：使用计算流体动力学（CFD）方法近似风场可能会非常耗时。创建用于交互式设计原型的工具，同时观察风流变化，需要更简单的模型来更快地模拟。深度学习中的数据驱动方法可能能够在很短的时间内给出类似的结果，而不是运行导致详细计算的数值近似。这项工作将使用CFD计算三维流场的问题重新表述为基于建筑物足迹的二维图像到图像转换问题，以预测行人高度水平的流场。我们研究了生成性对抗网络（GAN）的使用，如Pix2Pix[1]和CycleGAN[2]，它们代表了各个领域中图像到图像转换任务的最新技术，以及U-Net自动编码器[3]。模型可以以数据驱动的方式了解数据集的基本分布，我们认为这有助于模型从CFD中了解基本的雷诺平均Navier-Stokes（RANS）方程。我们在不同的有高度信息和没有高度信息的三维断崖形建筑物上进行了新的模拟数据集实验。此外，我们对一系列模型的生成图像进行了广泛的定性和定量评估，并将其性能与CFD提供的模拟结果进行了比较。然后，我们展示了向输入中添加位置数据可以通过在不同的体系结构上注入此类信息来产生更准确的结果。此外，我们还表明，通过应用注意机制和频谱归一化来促进稳定的训练，模型的性能得到了提高。摘要：Approximating wind flows using computational fluid dynamics (CFD) methods can be time-consuming. Creating a tool for interactively designing prototypes while observing the wind flow change requires simpler models to simulate faster. Instead of running numerical approximations resulting in detailed calculations, data-driven methods in deep learning might be able to give similar results in a fraction of the time. This work rephrases the problem from computing 3D flow fields using CFD to a 2D image-to-image translation-based problem on the building footprints to predict the flow field at pedestrian height level. We investigate the use of generative adversarial networks (GAN), such as Pix2Pix [1] and CycleGAN [2] representing state-of-the-art for image-to-image translation task in various domains as well as U-Net autoencoder [3]. The models can learn the underlying distribution of a dataset in a data-driven manner, which we argue can help the model learn the underlying Reynolds-averaged Navier-Stokes (RANS) equations from CFD. We experiment on novel simulated datasets on various three-dimensional bluff-shaped buildings with and without height information. Moreover, we present an extensive qualitative and quantitative evaluation of the generated images for a selection of models and compare their performance with the simulations delivered by CFD. We then show that adding positional data to the input can produce more accurate results by proposing a general framework for injecting such information on the different architectures. Furthermore, we show that the models performances improve by applying attention mechanisms and spectral normalization to facilitate stable training.

半/弱/无/有监督|不确定性|主动学习(8篇)

【1】 Masked Feature Prediction for Self-Supervised Visual Pre-Training 标题：用于自监督视觉预训练的掩蔽特征预测链接：https://arxiv.org/abs/2112.09133

作者：Chen Wei,Haoqi Fan,Saining Xie,Chao-Yuan Wu,Alan Yuille,Christoph Feichtenhofer 备注：Technical report 摘要：我们提出了用于视频模型自监督预训练的蒙面特征预测（MaskFeat）。我们的方法首先随机屏蔽部分输入序列，然后预测屏蔽区域的特征。我们研究了五种不同类型的特征，发现定向梯度直方图（HOG）是一种手工制作的特征描述符，在性能和效率方面都非常有效。我们观察到，HOG中的局部对比度归一化对于良好的结果至关重要，这与早期使用HOG进行视觉识别的工作是一致的。我们的方法可以学习丰富的可视化知识并驱动基于Transformer的大型模型。在不使用额外模型重量或监督的情况下，对未标记视频进行预训练的MaskFeat在Kinetics-400上的MViT-L、Kinetics-600上的88.3%、Kinetics-700上的80.4%、AVA上的38.8%和SSv2上的75.0%获得了前所未有的结果。MaskFeat进一步推广到图像输入，可以将其解释为单帧视频，并在ImageNet上获得具有竞争力的结果。摘要：We present Masked Feature Prediction (MaskFeat) for self-supervised pre-training of video models. Our approach first randomly masks out a portion of the input sequence and then predicts the feature of the masked regions. We study five different types of features and find Histograms of Oriented Gradients (HOG), a hand-crafted feature descriptor, works particularly well in terms of both performance and efficiency. We observe that the local contrast normalization in HOG is essential for good results, which is in line with earlier work using HOG for visual recognition. Our approach can learn abundant visual knowledge and drive large-scale Transformer-based models. Without using extra model weights or supervision, MaskFeat pre-trained on unlabeled videos achieves unprecedented results of 86.7% with MViT-L on Kinetics-400, 88.3% on Kinetics-600, 80.4% on Kinetics-700, 38.8 mAP on AVA, and 75.0% on SSv2. MaskFeat further generalizes to image input, which can be interpreted as a video with a single frame and obtains competitive results on ImageNet.

【2】 Deep Generative Models for Geometric Design Under Uncertainty 标题：不确定条件下几何设计的深度创成式模型链接：https://arxiv.org/abs/2112.08919

作者：Wei,Chen,Doksoo Lee,Wei Chen 备注：AAAI 2022 Workshop on AI for Design and Manufacturing (ADAM) 摘要：深度生成模型在学习紧凑和表达性设计表达方面表现出了有效性，显著改善了几何设计优化。然而，这些模型不考虑制造或制造所带来的不确定性。过去量化此类不确定性的工作通常对几何变化进行简化假设，而“真实世界”的不确定性及其对设计性能的影响由于高维性而难以量化。为了解决这个问题，我们提出了一个基于不确定性框架下的生成性对抗网络设计（GAN-DUF），该框架包含一个深层生成模型，该模型同时学习标称（理想）设计的紧凑表示和给定任何标称设计的装配设计的条件分布。我们在两个实际工程设计示例上演示了该框架，并展示了其在制造后找到具有更好性能的解决方案的能力。摘要：Deep generative models have demonstrated effectiveness in learning compact and expressive design representations that significantly improve geometric design optimization. However, these models do not consider the uncertainty introduced by manufacturing or fabrication. Past work that quantifies such uncertainty often makes simplified assumptions on geometric variations, while the "real-world" uncertainty and its impact on design performance are difficult to quantify due to the high dimensionality. To address this issue, we propose a Generative Adversarial Network-based Design under Uncertainty Framework (GAN-DUF), which contains a deep generative model that simultaneously learns a compact representation of nominal (ideal) designs and the conditional distribution of fabricated designs given any nominal design. We demonstrated the framework on two real-world engineering design examples and showed its capability of finding the solution that possesses better performances after fabrication.

【3】 Unsupervised Reinforcement Learning in Multiple Environments 标题：多环境下的无监督强化学习链接：https://arxiv.org/abs/2112.08746

作者：Mirco Mutti,Mattia Mancassola,Marcello Restelli 备注：In 36th AAAI Conference on Artificial Intelligence (AAAI 2022) 摘要：最近的几项工作致力于单一环境中的无监督强化学习，其中策略首先通过无监督交互进行预训练，然后针对在同一环境中定义的多个下游监督任务进行微调，以获得最优策略。沿着这条路线，我们解决了在一类多环境中的无监督强化学习问题，在该类环境中，策略通过全班的交互进行预训练，然后针对该类任何环境中的多个任务进行微调。值得注意的是，这个问题本质上是多目标的，因为我们可以通过多种方式在环境之间权衡训练前目标。在这项工作中，我们训练了一种对课堂上最不利的情况敏感的探索策略。因此，我们将探索问题转化为在一类环境中，由探索策略引起的状态访问熵的临界百分位平均值的最大化。然后，我们提出了一种策略梯度算法$alpha$MEPOL，通过与类的中介交互来优化引入的目标。最后，我们通过实证证明了该算法在学习探索连续环境中具有挑战性的类别方面的能力，并表明强化学习从预先训练的探索策略w.r.t.从头开始学习中获得了极大的好处。摘要：Several recent works have been dedicated to unsupervised reinforcement learning in a single environment, in which a policy is first pre-trained with unsupervised interactions, and then fine-tuned towards the optimal policy for several downstream supervised tasks defined over the same environment. Along this line, we address the problem of unsupervised reinforcement learning in a class of multiple environments, in which the policy is pre-trained with interactions from the whole class, and then fine-tuned for several tasks in any environment of the class. Notably, the problem is inherently multi-objective as we can trade off the pre-training objective between environments in many ways. In this work, we foster an exploration strategy that is sensitive to the most adverse cases within the class. Hence, we cast the exploration problem as the maximization of the mean of a critical percentile of the state visitation entropy induced by the exploration strategy over the class of environments. Then, we present a policy gradient algorithm, $alpha$MEPOL, to optimize the introduced objective through mediated interactions with the class. Finally, we empirically demonstrate the ability of the algorithm in learning to explore challenging classes of continuous environments and we show that reinforcement learning greatly benefits from the pre-trained exploration strategy w.r.t. learning from scratch.

【4】 Lacuna Reconstruction: Self-supervised Pre-training for Low-Resource Historical Document Transcription 标题：缺陷性重建：低资源历史文献抄写的自我监督预训练链接：https://arxiv.org/abs/2112.08692

作者：Nikolai Vogler,Jonathan Parkes Allen,Matthew Thomas Miller,Taylor Berg-Kirkpatrick 摘要：我们提出了一种自我监督的预训练方法，用于学习手写和印刷历史文档转录的丰富视觉语言表示。在监督下对我们预先训练的编码器表示进行微调，以实现两种语言的低资源文档转录后，（1）一组异构的手写伊斯兰手稿图像和（2）早期现代英语印刷文档，与从头开始训练的同一监督模型相比，我们显示了识别准确率的显著提高，仅需30行图像转录进行训练。我们的蒙面语言模型风格预训练策略，其中模型经过训练，能够从同一行中采样的干扰物中识别真实的蒙面视觉表示，鼓励学习对涂鸦书写风格和文档中存在的打印噪音保持不变的鲁棒语境化语言表示。摘要：We present a self-supervised pre-training approach for learning rich visual language representations for both handwritten and printed historical document transcription. After supervised fine-tuning of our pre-trained encoder representations for low-resource document transcription on two languages, (1) a heterogeneous set of handwritten Islamicate manuscript images and (2) early modern English printed documents, we show a meaningful improvement in recognition accuracy over the same supervised model trained from scratch with as few as 30 line image transcriptions for training. Our masked language model-style pre-training strategy, where the model is trained to be able to identify the true masked visual representation from distractors sampled from within the same line, encourages learning robust contextualized language representations invariant to scribal writing style and printing noise present across documents.

【5】 A White-Box SVM Framework and its Swarm-Based Optimization for Supervision of Toothed Milling Cutter through Characterization of Spindle Vibrations 标题：基于主轴振动表征的齿形铣刀监控白盒支持向量机框架及其群优化链接：https://arxiv.org/abs/2112.08421

作者：Tejas Y. Deo,Abhishek D. Patange,Sujit S. Pardeshi,R. Jegadeeshwaran,Apoorva N. Khairnar,Hrushikesh S. Khade 摘要：本文提出了一种白盒支持向量机（SVM）框架及其基于群体优化的方法，通过实时表征主轴振动来监控齿形铣刀。通过加速度时域响应和统计特征，研究了因加工中刀具失效（即侧面和头部磨损、弧坑和缺口磨损、边缘断裂）而产生的异常振动力矩。采用决策树作为估计量的交叉验证递归特征消除法（RFECV）进行特征选择。此外，标准支持向量机在刀具健康监测方面的能力已得到检验，随后通过应用基于群体的算法对其进行优化。对五种元启发式算法（大象放牧优化、帝王蝴蝶优化、哈里斯霍克斯优化、黏菌算法和蛾类搜索算法）的性能进行了比较分析。提出的白盒方法考虑了全局和局部表示，可深入了解机器学习模型在刀具状态监测中的性能。摘要：In this paper, a white-Box support vector machine (SVM) framework and its swarm-based optimization is presented for supervision of toothed milling cutter through characterization of real-time spindle vibrations. The anomalous moments of vibration evolved due to in-process tool failures (i.e., flank and nose wear, crater and notch wear, edge fracture) have been investigated through time-domain response of acceleration and statistical features. The Recursive Feature Elimination with Cross-Validation (RFECV) with decision trees as the estimator has been implemented for feature selection. Further, the competence of standard SVM has been examined for tool health monitoring followed by its optimization through application of swarm based algorithms. The comparative analysis of performance of five meta-heuristic algorithms (Elephant Herding Optimization, Monarch Butterfly Optimization, Harris Hawks Optimization, Slime Mould Algorithm, and Moth Search Algorithm) has been carried out. The white-box approach has been presented considering global and local representation that provides insight into the performance of machine learning models in tool condition monitoring.

【6】 Performance or Trust? Why Not Both. Deep AUC Maximization with Self-Supervised Learning for COVID-19 Chest X-ray Classifications 标题：表现还是信任？为什么不能两个都去呢？基于自监督学习的深度AUC最大化在冠状病毒胸片分类中的应用链接：https://arxiv.org/abs/2112.08363

作者：Siyuan He,Pengcheng Xi,Ashkan Ebadi,Stephane Tremblay,Alexander Wong 备注：None 摘要：有效的表征学习是提高医学图像分析模型性能的关键。在训练深度学习模型时，通常必须在性能和信任之间进行折衷，这两者对于医疗应用都至关重要。此外，采用交叉熵损失优化的模型在多数阶级中往往会出现不必要的过度自信，而在少数阶级中则会出现过度谨慎。在这项工作中，我们集成了一个新的代理损失与自我监督学习的计算机辅助筛选COVID-19患者使用放射线图像。此外，我们采用了一个新的量化分数来衡量模型的可信度。对特征学习方法和损失函数的性能和信任度进行了研究。比较表明，在自监督模型上利用新的代理损失可以产生高性能和可信的标签有效网络。摘要：Effective representation learning is the key in improving model performance for medical image analysis. In training deep learning models, a compromise often must be made between performance and trust, both of which are essential for medical applications. Moreover, models optimized with cross-entropy loss tend to suffer from unwarranted overconfidence in the majority class and over-cautiousness in the minority class. In this work, we integrate a new surrogate loss with self-supervised learning for computer-aided screening of COVID-19 patients using radiography images. In addition, we adopt a new quantification score to measure a model's trustworthiness. Ablation study is conducted for both the performance and the trust on feature learning methods and loss functions. Comparisons show that leveraging the new surrogate loss on self-supervised models can produce label-efficient networks that are both high-performing and trustworthy.

【7】 Objective hearing threshold identification from auditory brainstem response measurements using supervised and self-supervised approaches 标题：用监督和自我监督方法从听性脑干反应测量中识别客观听阈链接：https://arxiv.org/abs/2112.08961

作者：Dominik Thalmeier,Gregor Miller,Elida Schneltzer,Anja Hurt,Martin Hrabě de Angelis,Lore Becker,Christian L. Müller,Holger Maier 备注：41 pages, 17 figures 摘要：听力损失是人类的主要健康问题和心理负担。小鼠模型为阐明听力损伤的潜在发育和病理生理机制中的相关基因提供了可能。为此，大规模的小鼠表型鉴定程序包括单基因敲除小鼠系的听觉表型鉴定。利用听觉脑干反应（ABR）程序，德国小鼠诊所和世界各地的类似机构已经产生了突变型和野生型小鼠的大型、统一的平均ABR原始数据集。在标准ABR分析过程中，由训练有素的工作人员从一系列增加声压级的信号曲线中目测评估听阈。这非常耗时，并且容易受到读者以及图形显示质量和比例的影响。为了减少工作量，提高质量和再现性，我们开发并比较了两种从平均ABR原始数据中自动识别听阈的方法：一种监督方法，包括两个在人类生成的标签上训练的组合神经网络和一种自监督方法，它利用信号功率谱，将随机森林声级估计与分段曲线拟合算法相结合，用于阈值的确定。结果表明，这两种模型都能很好地工作，性能优于人体阈值检测，适用于快速、可靠、无偏的听阈检测和质量控制。在高通量小鼠表型环境中，这两种方法都可以作为自动端到端筛选管道的一部分来检测听力相关候选基因。这两个模型的代码以及用于这项工作的数据都是免费提供的。摘要：Hearing loss is a major health problem and psychological burden in humans. Mouse models offer a possibility to elucidate genes involved in the underlying developmental and pathophysiological mechanisms of hearing impairment. To this end, large-scale mouse phenotyping programs include auditory phenotyping of single-gene knockout mouse lines. Using the auditory brainstem response (ABR) procedure, the German Mouse Clinic and similar facilities worldwide have produced large, uniform data sets of averaged ABR raw data of mutant and wildtype mice. In the course of standard ABR analysis, hearing thresholds are assessed visually by trained staff from series of signal curves of increasing sound pressure level. This is time-consuming and prone to be biased by the reader as well as the graphical display quality and scale. In an attempt to reduce workload and improve quality and reproducibility, we developed and compared two methods for automated hearing threshold identification from averaged ABR raw data: a supervised approach involving two combined neural networks trained on human-generated labels and a self-supervised approach, which exploits the signal power spectrum and combines random forest sound level estimation with a piece-wise curve fitting algorithm for threshold finding. We show that both models work well, outperform human threshold detection, and are suitable for fast, reliable, and unbiased hearing threshold detection and quality control. In a high-throughput mouse phenotyping environment, both methods perform well as part of an automated end-to-end screening pipeline to detect candidate genes for hearing involvement. Code for both models as well as data used for this work are freely available.

【8】 Bootstrap Equilibrium and Probabilistic Speaker Representation Learning for Self-supervised Speaker Verification 标题：用于自监督说话人确认的Bootstrap均衡和概率说话人表征学习链接：https://arxiv.org/abs/2112.08929

作者：Sung Hwan Mun,Min Hyun Han,Dongjune Lee,Jihwan Kim,Nam Soo Kim 备注：Accepted by IEEE Access 摘要：在本文中，我们提出了自监督说话人表示学习策略，包括前端的bootstrap均衡说话人表示学习和后端的不确定性感知概率说话人嵌入训练。在前端阶段，我们通过带一致性正则化项的自举训练方案学习说话人表示。在后端阶段，通过最大化属于同一说话人的语音样本之间的相互似然得分来估计概率说话人嵌入，这不仅提供说话人表示，而且提供数据不确定性。实验结果表明，所提出的bootstrap均衡训练策略能够有效地帮助学习说话人表征，并优于传统的基于对比学习的方法。此外，我们还证明了集成的两阶段框架进一步提高了VoxCeleb1测试集在EER和MinDCF方面的说话人验证性能。摘要：In this paper, we propose self-supervised speaker representation learning strategies, which comprise of a bootstrap equilibrium speaker representation learning in the front-end and an uncertainty-aware probabilistic speaker embedding training in the back-end. In the front-end stage, we learn the speaker representations via the bootstrap training scheme with the uniformity regularization term. In the back-end stage, the probabilistic speaker embeddings are estimated by maximizing the mutual likelihood score between the speech samples belonging to the same speaker, which provide not only speaker representations but also data uncertainty. Experimental results show that the proposed bootstrap equilibrium training strategy can effectively help learn the speaker representations and outperforms the conventional methods based on contrastive learning. Also, we demonstrate that the integrated two-stage framework further improves the speaker verification performance on the VoxCeleb1 test set in terms of EER and MinDCF.

迁移|Zero/Few/One-Shot|自适应(6篇)

【1】 Domain Prompts: Towards memory and compute efficient domain adaptation of ASR systems 标题：领域提示：面向ASR系统的存储和计算高效的领域适配链接：https://arxiv.org/abs/2112.08718

作者：Saket Dingliwal,Ashish Shenoy,Sravan Bodapati,Ankur Gandhe,Ravi Teja Gadde,Katrin Kirchhoff 备注：4 pages ICASSP submission 摘要：自动语音识别（ASR）系统已在许多不同领域的工业应用中得到应用。由于特定领域的系统在域内评估方面比一般系统表现得更好，因此显然需要内存和计算效率高的领域自适应。特别是，调整用于重新调整ASR假设的基于参数重Transformer的语言模型具有挑战性。在这项工作中，我们介绍了域提示，一种训练少量域令牌嵌入参数的方法，以将基于转换器的LM初始化到特定域。每个域只需几个额外的参数，我们就可以比使用不适应LM的基线提高7-14%的功率。尽管参数效率很高，但这些改进与具有数亿个参数的完全微调模型相当。通过对提示大小、数据集大小、初始化和域的定义，我们为在ASR系统中使用域提示的好处提供了证据。摘要：Automatic Speech Recognition (ASR) systems have found their use in numerous industrial applications in very diverse domains. Since domain-specific systems perform better than their generic counterparts on in-domain evaluation, the need for memory and compute-efficient domain adaptation is obvious. Particularly, adapting parameter-heavy transformer-based language models used for rescoring ASR hypothesis is challenging. In this work, we introduce domain-prompts, a methodology that trains a small number of domain token embedding parameters to prime a transformer-based LM to a particular domain. With just a handful of extra parameters per domain, we achieve 7-14% WER improvement over the baseline of using an unadapted LM. Despite being parameter-efficient, these improvements are comparable to those of fully-fine-tuned models with hundreds of millions of parameters. With ablations on prompt-sizes, dataset sizes, initializations and domains, we provide evidence for the benefits of using domain-prompts in ASR systems.

【2】 Extreme Zero-Shot Learning for Extreme Text Classification 标题：用于极端文本分类的极端零点学习链接：https://arxiv.org/abs/2112.08652

作者：Yuanhao Xiong,Wei-Cheng Chang,Cho-Jui Hsieh,Hsiang-Fu Yu,Inderjit Dhillon 备注：Our code is available at this https URL 摘要：极端多标签文本分类（XMC）问题涉及从大型标签集中为输入文本实例查找最相关的标签。然而，XMC设置面临两个挑战：（1）在动态环境中预测看不见的标签是不可推广的，（2）它需要大量受监督（实例、标签）对，这对于新兴领域来说可能很难获得。最近，人们研究了广义零拍XMC（GZ-XMC）设置，并相应地提出了ZestXML来处理看不见的标签，这仍然需要大量的注释（实例，标签）对。在本文中，我们考虑一个更实际的场景，称为极端Few-ShotXMC（EZ-XMC），其中不需要监督，只有实例和标签的原始文本是可访问的。还研究了监督有限的EZ-XMC的一个扩展——少炮XMC（FS-XMC）。为了学习实例和标签在原始文本中的语义嵌入，我们建议使用自监督对比损失对基于转换器的编码器进行预训练。具体而言，我们开发了一种预训练方法MACLR，该方法通过多尺度自适应聚类、标签正则化和伪正对自训练等技术充分利用原始文本。在四个公共EZ-XMC数据集上的实验结果表明，与所有其他领先的基线方法相比，MACLR实现了卓越的性能，尤其是平均在精确度和召回率方面提高了约5-10%。此外，我们还表明，当训练中的地面真值正对数量有限时，我们的预训练编码器可以在FS-XMC上进一步改进。通过在这样几个镜头子集上微调编码器，MACLR仍然显著优于其他极端分类器。摘要：The eXtreme Multi-label text Classification (XMC) problem concerns finding most relevant labels for an input text instance from a large label set. However, the XMC setup faces two challenges: (1) it is not generalizable to predict unseen labels in dynamic environments, and (2) it requires a large amount of supervised (instance, label) pairs, which can be difficult to obtain for emerging domains. Recently, the generalized zero-shot XMC (GZ-XMC) setup has been studied and ZestXML is proposed accordingly to handle the unseen labels, which still requires a large number of annotated (instance, label) pairs. In this paper, we consider a more practical scenario called Extreme Zero-Shot XMC (EZ-XMC), in which no supervision is needed and merely raw text of instances and labels are accessible. Few-Shot XMC (FS-XMC), an extension to EZ-XMC with limited supervision is also investigated. To learn the semantic embeddings of instances and labels with raw text, we propose to pre-train Transformer-based encoders with self-supervised contrastive losses. Specifically, we develop a pre-training method MACLR, which thoroughly leverages the raw text with techniques including Multi-scale Adaptive Clustering, Label Regularization, and self-training with pseudo positive pairs. Experimental results on four public EZ-XMC datasets demonstrate that MACLR achieves superior performance compared to all other leading baseline methods, in particular with approximately 5-10% improvement in precision and recall on average. Moreover, we also show that our pre-trained encoder can be further improved on FS-XMC when there are a limited number of ground-truth positive pairs in training. By fine-tuning the encoder on such a few-shot subset, MACLR still outperforms other extreme classifiers significantly.

【3】 UMAD: Universal Model Adaptation under Domain and Category Shift 标题：UMAD：领域和类别转换下的通用模型适应链接：https://arxiv.org/abs/2112.08553

作者：Jian Liang,Dapeng Hu,Jiashi Feng,Ran He 摘要：学习拒绝目标域中的未知样本（源类中不存在）对于无监督域自适应（UDA）非常重要。存在两种典型的UDA场景，即开放集和开放部分集，后者假设并非所有源类都出现在目标域中。然而，大多数以前的方法都是为一个UDA场景设计的，并且在另一个UDA场景中的性能总是很差。此外，它们在适应过程中还需要标记的源数据，这限制了它们在数据隐私敏感应用程序中的可用性。为了解决这些问题，本文提出了一个通用模型适应（UMAD）框架，该框架既可以处理UDA场景，又不需要访问源数据，也不需要事先了解域之间的类别转换。具体来说，我们的目标是学习一个源模型和一个精心设计的双头分类器，并将其提供给目标域。在适应过程中，我们开发了一个信息一致性评分，以帮助区分未知样本和已知样本。为了在目标域实现双边自适应，我们进一步最大化局部互信息，使已知样本与源分类器对齐，并分别利用熵损失将未知样本推离源分类边界。在开放集和开放部分集UDA场景上的实验表明，UMAD作为一种不访问源数据的统一方法，其性能与最先进的依赖数据的方法相当，甚至更高。摘要：Learning to reject unknown samples (not present in the source classes) in the target domain is fairly important for unsupervised domain adaptation (UDA). There exist two typical UDA scenarios, i.e., open-set, and open-partial-set, and the latter assumes that not all source classes appear in the target domain. However, most prior methods are designed for one UDA scenario and always perform badly on the other UDA scenario. Moreover, they also require the labeled source data during adaptation, limiting their usability in data privacy-sensitive applications. To address these issues, this paper proposes a Universal Model ADaptation (UMAD) framework which handles both UDA scenarios without access to the source data nor prior knowledge about the category shift between domains. Specifically, we aim to learn a source model with an elegantly designed two-head classifier and provide it to the target domain. During adaptation, we develop an informative consistency score to help distinguish unknown samples from known samples. To achieve bilateral adaptation in the target domain, we further maximize localized mutual information to align known samples with the source classifier and employ an entropic loss to push unknown samples far away from the source classification boundary, respectively. Experiments on open-set and open-partial-set UDA scenarios demonstrate that UMAD, as a unified approach without access to source data, exhibits comparable, if not superior, performance to state-of-the-art data-dependent methods.

【4】 FLoRA: Single-shot Hyper-parameter Optimization for Federated Learning 标题：FLORA：面向联邦学习的单次超参数优化链接：https://arxiv.org/abs/2112.08524

作者：Yi Zhou,Parikshit Ram,Theodoros Salonidis,Nathalie Baracaldo,Horst Samulowitz,Heiko Ludwig 摘要：我们解决了联邦学习（FL-HPO）的超参数优化（HPO）这一相对未被探索的问题。我们介绍了联邦损失面聚合（FLoRA），这是第一个FL-HPO解决方案框架，除了FL文献中常见的随机梯度下降/神经网络外，它还可以解决表格数据和梯度增强训练算法的使用情况。该框架通过首先识别**单个**FL训练中使用的一组好的超参数，实现单次FL-HPO。因此，与没有HPO的FL训练相比，它能够以最小的额外通信开销实现FL-HPO解决方案。我们对七个OpenML数据集上梯度增强决策树的FLoRA进行的实证评估表明，与所考虑的基线相比，模型精度显著提高，并且对参与FL-HPO训练的人数不断增加具有鲁棒性。摘要：We address the relatively unexplored problem of hyper-parameter optimization (HPO) for federated learning (FL-HPO). We introduce Federated Loss suRface Aggregation (FLoRA), the first FL-HPO solution framework that can address use cases of tabular data and gradient boosting training algorithms in addition to stochastic gradient descent/neural networks commonly addressed in the FL literature. The framework enables single-shot FL-HPO, by first identifying a good set of hyper-parameters that are used in a **single** FL training. Thus, it enables FL-HPO solutions with minimal additional communication overhead compared to FL training without HPO. Our empirical evaluation of FLoRA for Gradient Boosted Decision Trees on seven OpenML data sets demonstrates significant model accuracy improvements over the considered baseline, and robustness to increasing number of parties involved in FL-HPO training.

【5】 Algorithms for Adaptive Experiments that Trade-off Statistical Analysis with Reward: Combining Uniform Random Assignment and Reward Maximization 标题：统计分析与奖励折衷的适应性实验算法--均匀随机分配与奖励最大化相结合链接：https://arxiv.org/abs/2112.08507

作者：Jacob Nogas,Tong Li,Fernando J. Yanez,Arghavan Modiri,Nina Deliu,Ben Prystawski,Sofia S. Villar,Anna Rafferty,Joseph J. Williams 摘要：像汤普森抽样这样的多臂bandit算法可以用来进行自适应实验，在这种实验中，最大化奖励意味着数据被用来逐步分配更多的参与者到更有效的武器。这样的分配策略增加了统计假设检验的风险，即当两个臂之间没有差异时，会发现它们之间存在差异，而当确实存在差异时，则无法得出结论。我们对两臂实验进行了模拟，探索了两种算法，它们结合了统计分析中均匀随机化的优点和汤普森抽样（TS）实现的报酬最大化的优点。首先，前两个汤普森抽样加上固定数量的均匀随机分配（UR），随时间均匀分布。其次，提出了一种新的启发式算法，称为TS-PostDiff（后验差分概率）。TS PostDiff采用贝叶斯方法混合TS和UR：使用UR分配分配给参与者的概率是两个手臂之间的差异“小”（低于某个阈值）的后验概率，允许在获得很少或没有奖励时进行更多的UR探索。我们发现TS-PostDiff方法在多个效应尺寸上表现良好，因此不需要根据对真实效应尺寸的猜测进行调整。摘要：Multi-armed bandit algorithms like Thompson Sampling can be used to conduct adaptive experiments, in which maximizing reward means that data is used to progressively assign more participants to more effective arms. Such assignment strategies increase the risk of statistical hypothesis tests identifying a difference between arms when there is not one, and failing to conclude there is a difference in arms when there truly is one. We present simulations for 2-arm experiments that explore two algorithms that combine the benefits of uniform randomization for statistical analysis, with the benefits of reward maximization achieved by Thompson Sampling (TS). First, Top-Two Thompson Sampling adds a fixed amount of uniform random allocation (UR) spread evenly over time. Second, a novel heuristic algorithm, called TS PostDiff (Posterior Probability of Difference). TS PostDiff takes a Bayesian approach to mixing TS and UR: the probability a participant is assigned using UR allocation is the posterior probability that the difference between two arms is `small' (below a certain threshold), allowing for more UR exploration when there is little or no reward to be gained. We find that TS PostDiff method performs well across multiple effect sizes, and thus does not require tuning based on a guess for the true effect size.

【6】 Adaptation and Attention for Neural Video Coding 标题：神经视频编码的适应性和注意事项链接：https://arxiv.org/abs/2112.08767

作者：Nannan Zou,Honglei Zhang,Francesco Cricri,Ramin G. Youvalari,Hamed R. Tavakoli,Jani Lainema,Emre Aksu,Miska Hannuksela,Esa Rahtu 摘要：神经图像编码代表了目前最先进的图像压缩方法。然而，在视频领域还有很多工作要做。在这项工作中，我们提出了一种端到端的学习视频编解码器，该编解码器围绕适应和注意力的概念引入了一些架构新颖性和训练新颖性。我们的编解码器组织为帧内编解码器和帧间编解码器。作为一种新颖的架构，我们建议训练帧间编解码器模型，以适应基于输入视频分辨率的运动估计过程。第二个架构新颖之处是一个新的神经块，它结合了基于分散注意力的神经网络和DenseNets的概念。最后，我们建议在推断时过度拟合一组解码器端乘法参数。通过烧蚀研究和与现有技术的比较，我们展示了我们提出的技术在编码增益方面的优势。我们比较我们的编解码器VCV/H.266和RLVC，这代表了最先进的传统和端到端学习的编解码器，并在2021 CLIC竞争，E2EZTYOL的最高执行端到端的学习方法。我们的编解码器明显优于E2E____________________________________。摘要：Neural image coding represents now the state-of-the-art image compression approach. However, a lot of work is still to be done in the video domain. In this work, we propose an end-to-end learned video codec that introduces several architectural novelties as well as training novelties, revolving around the concepts of adaptation and attention. Our codec is organized as an intra-frame codec paired with an inter-frame codec. As one architectural novelty, we propose to train the inter-frame codec model to adapt the motion estimation process based on the resolution of the input video. A second architectural novelty is a new neural block that combines concepts from split-attention based neural networks and from DenseNets. Finally, we propose to overfit a set of decoder-side multiplicative parameters at inference time. Through ablation studies and comparisons to prior art, we show the benefits of our proposed techniques in terms of coding gains. We compare our codec to VVC/H.266 and RLVC, which represent the state-of-the-art traditional and end-to-end learned codecs, respectively, and to the top performing end-to-end learned approach in 2021 CLIC competition, E2E_T_OL. Our codec clearly outperforms E2E_T_OL, and compare favorably to VVC and RLVC in some settings.

强化学习(3篇)

【1】 Centralizing State-Values in Dueling Networks for Multi-Robot Reinforcement Learning Mapless Navigation 标题：多机器人强化学习无人导航的决斗网络状态值集中链接：https://arxiv.org/abs/2112.09012

作者：Enrico Marchesini,Alessandro Farinelli 备注：6 pages, 5 figures, 1 table. Accepted at IROS 2021 摘要：我们研究了流行的集中训练和分散执行（CTDE）模式下的多机器人mapless导航问题。当每个机器人考虑其路径而不与其他机器人明确共享观测值时，该问题具有挑战性，并可能导致深度强化学习（DRL）中的非平稳问题。典型的CTDE算法将联合行动价值函数分解为单独的行动价值函数，以利于合作并实现分散执行。这种因式分解涉及限制个体中出现新行为的约束（例如，单调性），因为每个代理都是从联合动作值开始训练的。相比之下，我们提出了一种新的CTDE体系结构，该体系结构使用集中式状态值网络来计算联合状态值，用于在基于值的代理更新中注入全局状态信息。因此，考虑到环境的整体状态，每个模型计算其权重的梯度更新。我们的想法遵循了决斗网络的观点，因为对关节状态值的单独估计既有提高样本效率的优势，又能为每个机器人提供全局状态是否有价值的信息。在2个、4个和8个机器人的机器人导航任务中进行的实验，证实了我们的方法比以前的CTDE方法（例如VDN、QMIX）具有更高的性能。摘要：We study the problem of multi-robot mapless navigation in the popular Centralized Training and Decentralized Execution (CTDE) paradigm. This problem is challenging when each robot considers its path without explicitly sharing observations with other robots and can lead to non-stationary issues in Deep Reinforcement Learning (DRL). The typical CTDE algorithm factorizes the joint action-value function into individual ones, to favor cooperation and achieve decentralized execution. Such factorization involves constraints (e.g., monotonicity) that limit the emergence of novel behaviors in an individual as each agent is trained starting from a joint action-value. In contrast, we propose a novel architecture for CTDE that uses a centralized state-value network to compute a joint state-value, which is used to inject global state information in the value-based updates of the agents. Consequently, each model computes its gradient update for the weights, considering the overall state of the environment. Our idea follows the insights of Dueling Networks as a separate estimation of the joint state-value has both the advantage of improving sample efficiency, while providing each robot information whether the global state is (or is not) valuable. Experiments in a robotic navigation task with 2 4, and 8 robots, confirm the superior performance of our approach over prior CTDE methods (e.g., VDN, QMIX).

【2】 Learning to Share in Multi-Agent Reinforcement Learning 标题：多智能体强化学习中的共享学习链接：https://arxiv.org/abs/2112.08702

作者：Yuxuan Yi,Ge Li,Yaowei Wang,Zongqing Lu 摘要：在本文中，我们研究了网络化多agent强化学习（MARL）问题，其中多个agent被部署为一个部分连接的网络，每个agent只与附近的agent交互。网络化MARL要求所有代理以分散的方式进行决策，以优化网络上邻居之间通信受限的全局目标。受共享在人类学习合作中起着关键作用这一事实的启发，我们提出了LToS，这是一个分层分散的MARL框架，使代理能够学习与邻居动态共享奖励，从而鼓励代理在全球目标上合作。对于每个agent，高级策略学习如何与邻居共享奖励以分解全局目标，而低级策略学习优化邻居中高级策略诱导的局部目标。这两种策略形成了一个双层优化和交替学习。我们的经验表明，LToS在社会困境和网络化MARL场景中都优于现有方法。摘要：In this paper, we study the problem of networked multi-agent reinforcement learning (MARL), where a number of agents are deployed as a partially connected network and each interacts only with nearby agents. Networked MARL requires all agents make decision in a decentralized manner to optimize a global objective with restricted communication between neighbors over the network. Inspired by the fact that extit{sharing} plays a key role in human's learning of cooperation, we propose LToS, a hierarchically decentralized MARL framework that enables agents to learn to dynamically share reward with neighbors so as to encourage agents to cooperate on the global objective. For each agent, the high-level policy learns how to share reward with neighbors to decompose the global objective, while the low-level policy learns to optimize local objective induced by the high-level policies in the neighborhood. The two policies form a bi-level optimization and learn alternately. We empirically demonstrate that LToS outperforms existing methods in both social dilemma and networked MARL scenario.

【3】 Feature-Attending Recurrent Modules for Generalization in Reinforcement Learning 标题：强化学习中泛化的特征参与递归模块链接：https://arxiv.org/abs/2112.08369

作者：Wilka Carvalho,Andrew Lampinen,Kyriacos Nikiforou,Felix Hill,Murray Shanahan 摘要：深度强化学习（Deep-RL）最近在开发泛化算法方面取得了重大进展。然而，大多数算法针对单一类型的泛化设置。在这项工作中，我们研究了三种不同任务结构的泛化：（a）由规则发生的物体运动的空间和时间组成的任务；（b）由对规则出现的3D对象的主动感知和导航组成的任务；和（c）由记忆目标信息组成的任务，这些目标信息覆盖定期出现的对象配置序列。这些不同的任务结构都有一个组成性的基本概念：任务完成总是涉及到将任务导向的感知和行为的重复部分结合起来。我们假设，如果代理能够发现捕获这些重复任务段的表示，那么它可以在任务结构中进行泛化。对于我们的任务，这对应于用于识别单个对象运动、导航到三维对象以及在对象配置中导航的表示。受认知科学的启发，我们将代理人经验中反复出现的部分称为“知觉图式”。我们提出了特征参与递归模块（featureattenting recurtive Modules，FARM），它学习感知模式分布在多个相对较小的递归模块上的状态表示。我们将农场与利用空间注意力的重复架构进行比较，后者将观察特征减少到空间位置的加权平均值。我们的实验表明，我们的特征注意机制能够更好地使FARM在我们所研究的各种以对象为中心的领域中进行推广。摘要：Deep reinforcement learning (Deep RL) has recently seen significant progress in developing algorithms for generalization. However, most algorithms target a single type of generalization setting. In this work, we study generalization across three disparate task structures: (a) tasks composed of spatial and temporal compositions of regularly occurring object motions; (b) tasks composed of active perception of and navigation towards regularly occurring 3D objects; and (c) tasks composed of remembering goal-information over sequences of regularly occurring object-configurations. These diverse task structures all share an underlying idea of compositionality: task completion always involves combining recurring segments of task-oriented perception and behavior. We hypothesize that an agent can generalize within a task structure if it can discover representations that capture these recurring task-segments. For our tasks, this corresponds to representations for recognizing individual object motions, for navigation towards 3D objects, and for navigating through object-configurations. Taking inspiration from cognitive science, we term representations for recurring segments of an agent's experience, "perceptual schemas". We propose Feature Attending Recurrent Modules (FARM), which learns a state representation where perceptual schemas are distributed across multiple, relatively small recurrent modules. We compare FARM to recurrent architectures that leverage spatial attention, which reduces observation features to a weighted average over spatial positions. Our experiments indicate that our feature-attention mechanism better enables FARM to generalize across the diverse object-centric domains we study.

符号|符号学习(1篇)

【1】 How to Learn and Represent Abstractions: An Investigation using Symbolic Alchemy 标题：如何学习和表现抽象：运用符号炼金术的研究链接：https://arxiv.org/abs/2112.08360

作者：Badr AlKhamissi,Akshay Srinivasan,Zeb-Kurth Nelson,Sam Ritter 备注：Preprint 摘要：Alchemy是一个新的元学习环境，它足够丰富，可以包含有趣的抽象，但也足够简单，可以使细粒度分析变得易于处理。此外，Alchemy提供了一个可选的符号接口，使meta RL研究无需大量计算预算。在这项工作中，我们采取了第一步，使用符号炼金术来确定设计选择，使深度RL代理能够学习各种类型的抽象。然后，通过各种行为和内省分析，我们调查了我们训练有素的代理如何使用和表示抽象任务变量，并发现了与抽象神经科学的有趣联系。最后，我们讨论了使用meta RL和炼金术更好地理解抽象变量在大脑中的表现的下一步。摘要：Alchemy is a new meta-learning environment rich enough to contain interesting abstractions, yet simple enough to make fine-grained analysis tractable. Further, Alchemy provides an optional symbolic interface that enables meta-RL research without a large compute budget. In this work, we take the first steps toward using Symbolic Alchemy to identify design choices that enable deep-RL agents to learn various types of abstraction. Then, using a variety of behavioral and introspective analyses we investigate how our trained agents use and represent abstract task variables, and find intriguing connections to the neuroscience of abstraction. We conclude by discussing the next steps for using meta-RL and Alchemy to better understand the representation of abstract variables in the brain.

医学相关(6篇)

【1】 A molecular generative model with genetic algorithm and tree search for cancer samples 标题：基于遗传算法和树搜索的癌症样本分子生成模型链接：https://arxiv.org/abs/2112.08959

作者：Sejin Park,Hyunju Lee 摘要：通过基于患者的基因特征对患者进行治疗，个性化药物有望最大限度地发挥预期的药物作用，并将副作用降至最低。因此，根据疾病的遗传特征来制备药物，特别是在抗癌药物的发现中，是非常重要的。然而，这是一个挑战，因为巨大的化学空间和癌症性质的变化需要大量的时间来寻找合适的分子。因此，在抗癌药物的从头分子设计中，需要一种考虑基因图谱的高效、快速的搜索方法。在此，我们提出了一种基于遗传算法和癌症样本树搜索的快速分子生成模型（FasterGTS）。FasterGTS由遗传算法和蒙特卡罗树搜索以及三个深层神经网络构成：监督学习、自训练和价值网络，它根据癌症样本的遗传图谱生成抗癌分子。与其他方法相比，Fastergs在有限的样本数量内生成了癌症样本特异性分子，这些分子具有癌症药物所需的一般化学性质。我们期望FasterGets有助于抗癌药物的产生。摘要：Personalized medicine is expected to maximize the intended drug effects and minimize side effects by treating patients based on their genetic profiles. Thus, it is important to generate drugs based on the genetic profiles of diseases, especially in anticancer drug discovery. However, this is challenging because the vast chemical space and variations in cancer properties require a huge time resource to search for proper molecules. Therefore, an efficient and fast search method considering genetic profiles is required for de novo molecular design of anticancer drugs. Here, we propose a faster molecular generative model with genetic algorithm and tree search for cancer samples (FasterGTS). FasterGTS is constructed with a genetic algorithm and a Monte Carlo tree search with three deep neural networks: supervised learning, self-trained, and value networks, and it generates anticancer molecules based on the genetic profiles of a cancer sample. When compared to other methods, FasterGTS generated cancer sample-specific molecules with general chemical properties required for cancer drugs within the limited numbers of samplings. We expect that FasterGTS contributes to the anticancer drug generation.

【2】 COVID-19 Electrocardiograms Classification using CNN Models 标题：基于CNN模型的冠状病毒心电图分类链接：https://arxiv.org/abs/2112.08931

作者：Ismail Shahin,Ali Bou Nassif,Mohamed Bader Alsabek 备注：5 pages, 4 figures, accepted in the 14th International Conference on Developments in eSystems Engineering, 7-10 December, 2021 摘要：随着COVID-19的周期性上升和下降以及许多国家受到其影响的影响，全世界科学家、研究者和医生已经做了大量的工作。迫切需要及时干预，以应对该疾病的不合理传播。通过应用深度学习算法的基础知识，人工智能（AI）的实现为数字健康区做出了重大贡献。在2019冠状病毒疾病诊断中，提出了一种新的方法，即利用深度学习算法，特别是卷积神经网络（CNN）模型，利用心电图数据自动诊断COVID-19。该框架中使用了几个CNN模型，包括VGG16、VGG19、InceptionResnetv2、InceptionV3、Resnet50和Densenet201。VGG16模型优于其他模型，准确率为85.92%。我们的结果表明，与VGG16模型相比，其余模型的精度相对较低，这是由于所使用的数据集较小，此外，仅对VGG16模型使用网格搜索超参数优化方法。此外，我们的结果是预备性的，并且有可能通过进一步扩展数据集和采用合适的超参数优化技术来提高所有模型的准确性。摘要：With the periodic rise and fall of COVID-19 and numerous countries being affected by its ramifications, there has been a tremendous amount of work that has been done by scientists, researchers, and doctors all over the world. Prompt intervention is keenly needed to tackle the unconscionable dissemination of the disease. The implementation of Artificial Intelligence (AI) has made a significant contribution to the digital health district by applying the fundamentals of deep learning algorithms. In this study, a novel approach is proposed to automatically diagnose the COVID-19 by the utilization of Electrocardiogram (ECG) data with the integration of deep learning algorithms, specifically the Convolutional Neural Network (CNN) models. Several CNN models have been utilized in this proposed framework, including VGG16, VGG19, InceptionResnetv2, InceptionV3, Resnet50, and Densenet201. The VGG16 model has outperformed the rest of the models, with an accuracy of 85.92%. Our results show a relatively low accuracy in the rest of the models compared to the VGG16 model, which is due to the small size of the utilized dataset, in addition to the exclusive utilization of the Grid search hyperparameters optimization approach for the VGG16 model only. Moreover, our results are preparatory, and there is a possibility to enhance the accuracy of all models by further expanding the dataset and adapting a suitable hyperparameters optimization technique.

【3】 Multiple Instance Learning for Brain Tumor Detection from Magnetic Resonance Spectroscopy Data 标题：磁共振波谱数据的多示例学习在脑肿瘤检测中的应用链接：https://arxiv.org/abs/2112.08845

作者：Diyuan Lu,Gerhard Kurz,Nenad Polomac,Iskra Gacheva,Elke Hattingen,Jochen Triesch 摘要：我们应用深度学习（DL）对磁共振波谱（MRS）数据进行脑肿瘤检测。医疗应用程序经常遭受数据稀缺和噪声破坏的困扰。这两个问题在我们的数据集中都很突出。此外，不同的患者可获得不同数量的光谱。我们通过将任务视为多实例学习（MIL）问题来解决这些问题。具体而言，我们将来自同一患者的多个光谱聚合到一个“袋子”中进行分类，并应用数据增强技术。为了实现装袋过程中的排列不变性，我们提出了两种方法：（1）对一个袋子中所有样本的特征应用最小、最大和平均池；（2）应用注意机制。我们在多个神经网络结构上测试了这两种方法。我们证明了在多个实例而不是单个光谱上进行训练时，分类性能显著提高。我们提出了一种简单的过采样数据增强方法，并表明它可以进一步提高性能。最后，我们证明，根据大多数性能指标，我们提出的模型优于神经放射科医生手动分类。摘要：We apply deep learning (DL) on Magnetic resonance spectroscopy (MRS) data for the task of brain tumor detection. Medical applications often suffer from data scarcity and corruption by noise. Both of these problems are prominent in our data set. Furthermore, a varying number of spectra are available for the different patients. We address these issues by considering the task as a multiple instance learning (MIL) problem. Specifically, we aggregate multiple spectra from the same patient into a "bag" for classification and apply data augmentation techniques. To achieve the permutation invariance during the process of bagging, we proposed two approaches: (1) to apply min-, max-, and average-pooling on the features of all samples in one bag and (2) to apply an attention mechanism. We tested these two approaches on multiple neural network architectures. We demonstrate that classification performance is significantly improved when training on multiple instances rather than single spectra. We propose a simple oversampling data augmentation method and show that it could further improve the performance. Finally, we demonstrate that our proposed model outperforms manual classification by neuroradiologists according to most performance metrics.

【4】 Search for temporal cell segmentation robustness in phase-contrast microscopy videos 标题：相衬显微镜视频中时间细胞分割鲁棒性的研究链接：https://arxiv.org/abs/2112.08817

作者：Estibaliz Gómez-de-Mariscal,Hasini Jayatilaka,Özgün Çiçek,Thomas Brox,Denis Wirtz,Arrate Muñoz-Barrutia 摘要：研究细胞形态随时间的变化对于理解细胞迁移机制至关重要。在这项工作中，我们提出了一个基于深度学习的工作流程，以分割嵌入三维胶原基质中的癌细胞，并用相差显微镜成像。我们的方法使用转移学习和循环卷积长短时记忆单元来利用过去的时间信息，并提供一致的分割结果。最后，我们提出了一种研究癌细胞形态的几何表征方法。我们的方法在时间上提供了稳定的结果，并且对不同的权重初始化或训练数据采样具有鲁棒性。我们引入了一个新的用于二维细胞分割和跟踪的带注释数据集，以及一个开源实现来复制实验或使其适应新的图像处理问题。摘要：Studying cell morphology changes in time is critical to understanding cell migration mechanisms. In this work, we present a deep learning-based workflow to segment cancer cells embedded in 3D collagen matrices and imaged with phase-contrast microscopy. Our approach uses transfer learning and recurrent convolutional long-short term memory units to exploit the temporal information from the past and provide a consistent segmentation result. Lastly, we propose a geometrical-characterization approach to studying cancer cell morphology. Our approach provides stable results in time, and it is robust to the different weight initialization or training data sampling. We introduce a new annotated dataset for 2D cell segmentation and tracking, and an open-source implementation to replicate the experiments or adapt them to new image processing problems.

【5】 CLIN-X: pre-trained language models and a study on cross-task transfer for concept extraction in the clinical domain 标题：CLIN-X：临床领域概念提取的预训练语言模型和跨任务迁移研究链接：https://arxiv.org/abs/2112.08754

作者：Lukas Lange,Heike Adel,Jannik Strötgen,Dietrich Klakow 摘要：自然语言处理（NLP）领域最近发生了巨大的变化，使用预先训练好的语言模型来解决几乎所有的任务。尽管在各种任务的基准数据集方面有了很大的改进，但这些模型在非标准领域（如临床领域）的表现往往是次优的，在这些领域中，训练前文档和目标文档之间存在很大差距。在本文中，我们旨在通过特定领域的语言模型训练来弥补这一差距，并研究其对一系列下游任务和环境的影响。我们介绍了预训练的CLIN-X（Clinical XLM-R）语言模型，并展示了CLIN-X如何在两种语言的十项临床概念提取任务中大大优于其他预训练的transformer模型。此外，我们还演示了如何使用我们提出的基于随机拆分和跨句子上下文的集成的任务和语言不可知模型架构进一步改进transformer模型。我们在低资源和转移设置下的研究显示，尽管缺少注释数据，但模型性能稳定，当只有250个标记句子可用时，改进高达47个点。我们的结果强调了特殊语言模型（如CLIN-X）在非标准领域概念提取中的重要性，但也表明我们的任务不可知模型体系结构在测试任务和语言中是健壮的，因此不需要领域或任务特定的调整。CLIN Xlanguage模型以及用于微调和传输模型的源代码可在https://github.com/boschresearch/clin\_x/和huggingface模型轮毂。摘要：The field of natural language processing (NLP) has recently seen a large change towards using pre-trained language models for solving almost any task. Despite showing great improvements in benchmark datasets for various tasks, these models often perform sub-optimal in non-standard domains like the clinical domain where a large gap between pre-training documents and target documents is observed. In this paper, we aim at closing this gap with domain-specific training of the language model and we investigate its effect on a diverse set of downstream tasks and settings. We introduce the pre-trained CLIN-X (Clinical XLM-R) language models and show how CLIN-X outperforms other pre-trained transformer models by a large margin for ten clinical concept extraction tasks from two languages. In addition, we demonstrate how the transformer model can be further improved with our proposed task- and language-agnostic model architecture based on ensembles over random splits and cross-sentence context. Our studies in low-resource and transfer settings reveal stable model performance despite a lack of annotated data with improvements of up to 47 F1points when only 250 labeled sentences are available. Our results highlight the importance of specialized language models as CLIN-X for concept extraction in non-standard domains, but also show that our task-agnostic model architecture is robust across the tested tasks and languages so that domain- or task-specific adaptations are not required. The CLIN-Xlanguage models and source code for fine-tuning and transferring the model are publicly available at https://github.com/boschresearch/clin\_x/ and the huggingface model hub.

【6】 Quality monitoring of federated Covid-19 lesion segmentation 标题：联合冠状病毒病变分割的质量监测链接：https://arxiv.org/abs/2112.08974

作者：Camila Gonzalez,Christian Harder,Amin Ranem,Ricarda Fischbach,Isabel Kaltenborn,Armin Dadras,Andreas Bucher,Anirban Mukhopadhyay 摘要：联邦学习是训练健壮的深度学习模型以分割胸部CT中新冠病毒-19相关发现的最有希望的方法。通过分散式学习，可以利用各种来源和采集协议的异构数据，同时确保患者隐私。然而，持续监控模型的性能至关重要。然而，当涉及到弥漫性肺部病变的分割时，快速目视检查不足以评估质量，由专家放射科医生对所有网络输出进行彻底监测是不可行的。在这项工作中，我们提出了一系列轻量级指标，这些指标可以在每个医院本地计算，然后聚合起来用于联邦系统的集中监控。我们的线性模型在分布外数据集上检测到70%以上的低质量分段，因此可靠地表示模型性能下降。摘要：Federated Learning is the most promising way to train robust Deep Learning models for the segmentation of Covid-19-related findings in chest CTs. By learning in a decentralized fashion, heterogeneous data can be leveraged from a variety of sources and acquisition protocols whilst ensuring patient privacy. It is, however, crucial to continuously monitor the performance of the model. Yet when it comes to the segmentation of diffuse lung lesions, a quick visual inspection is not enough to assess the quality, and thorough monitoring of all network outputs by expert radiologists is not feasible. In this work, we present an array of lightweight metrics that can be calculated locally in each hospital and then aggregated for central monitoring of a federated system. Our linear model detects over 70% of low-quality segmentations on an out-of-distribution dataset and thus reliably signals a decline in model performance.

推荐(1篇)

【1】 SanMove: Next Location Recommendation via Self-Attention Network 标题：SanMove：自助网的下一个位置推荐链接：https://arxiv.org/abs/2112.09076

作者：Huifeng Li,Bin Wang,Sulei Zhu,Yanyan Xu 摘要：目前，下一个位置推荐在基于位置的社交网络应用和服务中起着至关重要的作用。虽然已经提出了许多方法来解决这个问题，但到目前为止，有三个重要的挑战尚未得到很好的解决：（1）大多数现有方法都基于递归网络，由于不允许完全并行，因此训练长序列非常耗时；（2）个性化偏好通常没有得到合理考虑；（3）现有的方法很少系统地研究如何有效地利用轨迹数据中的各种辅助信息（如用户ID和时间戳）以及非连续位置之间的时空关系。为了应对上述挑战，我们提出了一种新的方法SanMove，一种基于自关注网络的模型，通过捕捉用户的长期和短期移动模式来预测下一个位置。具体来说，SanMove引入了一个长期偏好学习模块，它使用一个自我关注模块来捕获用户的长期移动模式，该模式可以表示用户的个性化位置偏好。同时，SanMove使用时空引导的非侵入性自我注意（STNOVA）来利用辅助信息来学习短期偏好。我们使用两个真实数据集对SanMove进行评估，并证明SanMove不仅比最先进的基于RNN的预测模型更快，而且在下一个位置预测方面也优于基线。摘要：Currently, next location recommendation plays a vital role in location-based social network applications and services. Although many methods have been proposed to solve this problem, three important challenges have not been well addressed so far: (1) most existing methods are based on recurrent network, which is time-consuming to train long sequences due to not allowing for full parallelism; (2) personalized preferences generally are not considered reasonably; (3) existing methods rarely systematically studied how to efficiently utilize various auxiliary information (e.g., user ID and timestamp) in trajectory data and the spatio-temporal relations among non-consecutive locations. To address the above challenges, we propose a novel method named SanMove, a self-attention network based model, to predict the next location via capturing the long- and short-term mobility patterns of users. Specifically, SanMove introduces a long-term preference learning module, and it uses a self-attention module to capture the users long-term mobility pattern which can represent personalized location preferences of users. Meanwhile, SanMove uses a spatial-temporal guided non-invasive self-attention (STNOVA) to exploit auxiliary information to learn short-term preferences. We evaluate SanMove with two real-world datasets, and demonstrate SanMove is not only faster than the state-of-the-art RNN-based predict model but also outperforms the baselines for next location prediction.

聚类(2篇)

【1】 A Proposition-Level Clustering Approach for Multi-Document Summarization 标题：一种命题级别的多文档文摘聚类方法链接：https://arxiv.org/abs/2112.08770

作者：Ori Ernst,Avi Caciularu,Ori Shapira,Ramakanth Pasunuru,Mohit Bansal,Jacob Goldberger,Ido Dagan 摘要：传统上，文本聚类方法被合并到多文档摘要（MDS）中，作为处理大量信息重复的一种手段。利用集群来表示信息显著性并避免冗余。这些方法侧重于对句子进行聚类，即使密切相关的句子通常也包含不对齐的信息。在这项工作中，我们重新讨论了聚类方法，将命题组合在一起以实现更精确的信息对齐。具体地说，我们的方法检测显著命题，将它们聚类成释义类，并通过融合命题为每个类生成一个代表性句子。在DUC 2004和TAC 2011数据集中，我们的总结方法在自动胭脂评分和人类偏好方面都比以前最先进的MDS方法有所改进。摘要：Text clustering methods were traditionally incorporated into multi-document summarization (MDS) as a means for coping with considerable information repetition. Clusters were leveraged to indicate information saliency and to avoid redundancy. These methods focused on clustering sentences, even though closely related sentences also usually contain non-aligning information. In this work, we revisit the clustering approach, grouping together propositions for more precise information alignment. Specifically, our method detects salient propositions, clusters them into paraphrastic clusters, and generates a representative sentence for each cluster by fusing its propositions. Our summarization method improves over the previous state-of-the-art MDS method in the DUC 2004 and TAC 2011 datasets, both in automatic ROUGE scores and human preference.

【2】 KnAC: an approach for enhancing cluster analysis with background knowledge and explanations 标题：KNAC：一种利用背景知识和解释增强聚类分析的方法链接：https://arxiv.org/abs/2112.08759

作者：Szymon Bobek,Michał Kuk,Jakub Brzegowski,Edyta Brzychczy,Grzegorz J. Nalepa 备注：Submitted to Applied Intelligence 摘要：自几十年来，多维数据集中的模式发现一直是研究的主题。有许多聚类算法可用于此目的。然而，它们的实际应用在后聚类阶段有着共同点，这涉及到基于专家的解释和对所得结果的分析。我们认为，这可能是过程的瓶颈，尤其是在领域知识存在于集群之前的情况下。这种情况不仅需要对自动发现的集群进行适当的分析，还需要对现有知识进行一致性检查。在这项工作中，我们提出了知识增强聚类（KnAC），其主要目标是将基于专家的标记与自动聚类相结合，以更新和细化前者。我们的解决方案不依赖于任何现成的聚类算法，也不引入任何一种。取而代之的是，KnAC可以作为任意聚类算法的扩充，使该方法具有鲁棒性和模型不可知性。我们在人工、可复制的示例和真实的用例场景中演示了我们的方法的可行性。摘要：Pattern discovery in multidimensional data sets has been a subject of research since decades. There exists a wide spectrum of clustering algorithms that can be used for that purpose. However, their practical applications share in common the post-clustering phase, which concerns expert-based interpretation and analysis of the obtained results. We argue that this can be a bottleneck of the process, especially in the cases where domain knowledge exists prior to clustering. Such a situation requires not only a proper analysis of automatically discovered clusters, but also a conformance checking with existing knowledge. In this work, we present Knowledge Augmented Clustering (KnAC), which main goal is to confront expert-based labelling with automated clustering for the sake of updating and refining the former. Our solution does not depend on any ready clustering algorithm, nor introduce one. Instead KnAC can serve as an augmentation of an arbitrary clustering algorithm, making the approach robust and model-agnostic. We demonstrate the feasibility of our method on artificially, reproducible examples and on a real life use case scenario.

超分辨率|去噪|去模糊|去雾(1篇)

【1】 Stable Long-Term Recurrent Video Super-Resolution 标题：稳定的长期循环视频超分辨率链接：https://arxiv.org/abs/2112.08950

作者：Benjamin Naoto Chiche,Arnaud Woiselle,Joana Frontera-Pons,Jean-Luc Starck 备注：9 pages, 8 figures 摘要：与基于滑动窗口的模型相比，基于深度学习（DL）的视频超分辨率（VSR）中的递归模型具有更高的计算效率、时间感受野和时间一致性，因此得到了广泛的应用。然而，当对呈现低运动的长视频序列（即场景的某些部分几乎不移动）进行推断时，递归模型通过递归处理发散，产生高频伪影。据我们所知，没有任何关于VSR的研究指出这种不稳定性问题，这对于一些实际应用来说是至关重要的。视频监控是一个典型的例子，在这种情况下会出现这种伪影，因为相机和场景都会长时间保持静止。在这项工作中，我们暴露了现有的循环VSR网络在低运动的长序列上的不稳定性。我们在我们创建的一个新的长序列数据集准静态视频集上演示了它。最后，基于Lipschitz稳定性理论，我们提出了一种新的循环VSR网络框架，它既稳定又有竞争性。基于此框架，我们提出了一种新的递归VSR网络，即中间递归视频超分辨率（MRVSR）。我们通过经验证明了它在低运动的长序列上的竞争性能。摘要：Recurrent models have gained popularity in deep learning (DL) based video super-resolution (VSR), due to their increased computational efficiency, temporal receptive field and temporal consistency compared to sliding-window based models. However, when inferring on long video sequences presenting low motion (i.e. in which some parts of the scene barely move), recurrent models diverge through recurrent processing, generating high frequency artifacts. To the best of our knowledge, no study about VSR pointed out this instability problem, which can be critical for some real-world applications. Video surveillance is a typical example where such artifacts would occur, as both the camera and the scene stay static for a long time. In this work, we expose instabilities of existing recurrent VSR networks on long sequences with low motion. We demonstrate it on a new long sequence dataset Quasi-Static Video Set, that we have created. Finally, we introduce a new framework of recurrent VSR networks that is both stable and competitive, based on Lipschitz stability theory. We propose a new recurrent VSR network, coined Middle Recurrent Video Super-Resolution (MRVSR), based on this framework. We empirically show its competitive performance on long sequences with low motion.

自动驾驶|车辆|车道检测等(3篇)

【1】 End-to-End Multi-Task Deep Learning and Model Based Control Algorithm for Autonomous Driving 标题：端到端多任务深度学习和基于模型的自主驾驶控制算法链接：https://arxiv.org/abs/2112.08967

作者：Der-Hau Lee,Jinn-Liang Liu 备注：10 pages, 7 figures 摘要：采用深度学习神经网络（DNN）的端到端驾驶已成为工业界和学术界快速发展的自主驾驶范例。然而，安全措施和可解释性仍然对这种模式构成挑战。我们提出了一种端到端驱动算法，该算法将多任务DNN、路径预测和控制模型集成在一条数据流管道中，数据流从传感器设备通过这些模型传输到驱动决策。它提供了定量的措施来评估端到端驱动系统的整体、动态和实时性能，从而可以量化其安全性和可解释性。DNN是一种改进的UNet，它是一种著名的语义分段编码-解码神经网络。它包括一个分割、一个回归和两个车道分割、路径预测和车辆控制的分类任务。我们提出了具有不同复杂性的改进的UNet体系结构的三种变体，在单任务和多任务（MT）体系结构的四种静态度量中对它们在不同任务上进行比较，然后在实时仿真中通过两种额外的动态度量来确定最佳的一种。我们还提出了一种基于学习和模型的纵向控制器，采用模型预测控制方法。使用Stanley横向控制器，我们的结果表明，MTUNet在正常速度下弯曲道路上的曲率和横向偏移估计方面优于早期修改的UNet，这已在真实道路上驾驶的真实汽车上进行了测试。摘要：End-to-end driving with a deep learning neural network (DNN) has become a rapidly growing paradigm of autonomous driving in industry and academia. Yet safety measures and interpretability still pose challenges to this paradigm. We propose an end-to-end driving algorithm that integrates multi-task DNN, path prediction, and control models in a pipeline of data flow from sensory devices through these models to driving decisions. It provides quantitative measures to evaluate the holistic, dynamic, and real-time performance of end-to-end driving systems, and thus allows to quantify their safety and interpretability. The DNN is a modified UNet, a well known encoder-decoder neural network of semantic segmentation. It consists of one segmentation, one regression, and two classification tasks for lane segmentation, path prediction, and vehicle controls. We present three variants of the modified UNet architecture having different complexities, compare them on different tasks in four static measures for both single and multi-task (MT) architectures, and then identify the best one by two additional dynamic measures in real-time simulation. We also propose a learning- and model-based longitudinal controller using model predictive control method. With the Stanley lateral controller, our results show that MTUNet outperforms an earlier modified UNet in terms of curvature and lateral offset estimation on curvy roads at normal speed, which has been tested in a real car driving on real roads.

【2】 Improved YOLOv5 network for real-time multi-scale traffic sign detection 标题：用于实时多尺度交通标志检测的改进YOLOv5网络链接：https://arxiv.org/abs/2112.08782

作者：Junfan Wang,Yi Chen,Mingyu Gao,Zhekang Dong 摘要：交通标志检测对于无人驾驶系统来说是一项具有挑战性的任务，尤其是对于多尺度目标的检测和检测的实时性问题。在交通标志检测过程中，目标的尺度变化很大，这会对检测精度产生一定的影响。特征金字塔被广泛用于解决这一问题，但它可能会破坏不同尺度交通标志的特征一致性。此外，在实际应用中，常规方法难以在保证实时检测的同时提高多尺度交通标志的检测精度。在本文中，我们提出了一种改进的特征金字塔模型AF-FPN，该模型利用自适应注意模块（AAM）和特征增强模块（FEM）来减少特征地图生成过程中的信息损失，增强特征金字塔的表示能力。我们用AF-FPN替换了YOLOv5中原有的特征金字塔网络，在保证实时检测的前提下，提高了YOLOv5网络对多尺度目标的检测性能。此外，还提出了一种新的自动学习数据扩充方法，以丰富数据集，提高模型的鲁棒性，使其更适合实际场景。在清华腾讯100K（TT100K）数据集上的大量实验结果表明，与几种最先进的方法相比，该方法具有有效性和优越性。摘要：Traffic sign detection is a challenging task for the unmanned driving system, especially for the detection of multi-scale targets and the real-time problem of detection. In the traffic sign detection process, the scale of the targets changes greatly, which will have a certain impact on the detection accuracy. Feature pyramid is widely used to solve this problem but it might break the feature consistency across different scales of traffic signs. Moreover, in practical application, it is difficult for common methods to improve the detection accuracy of multi-scale traffic signs while ensuring real-time detection. In this paper, we propose an improved feature pyramid model, named AF-FPN, which utilizes the adaptive attention module (AAM) and feature enhancement module (FEM) to reduce the information loss in the process of feature map generation and enhance the representation ability of the feature pyramid. We replaced the original feature pyramid network in YOLOv5 with AF-FPN, which improves the detection performance for multi-scale targets of the YOLOv5 network under the premise of ensuring real-time detection. Furthermore, a new automatic learning data augmentation method is proposed to enrich the dataset and improve the robustness of the model to make it more suitable for practical scenarios. Extensive experimental results on the Tsinghua-Tencent 100K (TT100K) dataset demonstrate the effectiveness and superiority of the proposed method when compared with several state-of-the-art methods.

【3】 Deep Generative Models for Vehicle Speed Trajectories 标题：车辆速度轨迹的深层产生式模型链接：https://arxiv.org/abs/2112.08361

作者：Farnaz Behnia,Dominik Karbowski,Vadim Sokolov 摘要：生成真实的车速轨迹是评估车辆燃油经济性和自动驾驶汽车预测控制的重要组成部分。传统的生成模型依赖于马尔可夫链方法，可以生成精确的合成轨迹，但会受到维数灾难的影响。它们不允许在生成过程中包含条件输入变量。在本文中，我们展示了对深层生成模型的扩展如何允许精确和可伸缩的生成。所提出的架构涉及循环层和前馈层，并使用对抗性技术进行训练。我们的模型在生成车辆轨迹方面表现良好，使用了一个根据芝加哥大都市区GPS数据训练的模型。摘要：Generating realistic vehicle speed trajectories is a crucial component in evaluating vehicle fuel economy and in predictive control of self-driving cars. Traditional generative models rely on Markov chain methods and can produce accurate synthetic trajectories but are subject to the curse of dimensionality. They do not allow to include conditional input variables into the generation process. In this paper, we show how extensions to deep generative models allow accurate and scalable generation. Proposed architectures involve recurrent and feed-forward layers and are trained using adversarial techniques. Our models are shown to perform well on generating vehicle trajectories using a model trained on GPS data from Chicago metropolitan area.

联邦学习|隐私保护|加密(2篇)

【1】 CodedPaddedFL and CodedSecAgg: Straggler Mitigation and Secure Aggregation in Federated Learning 标题：CodedPaddedFL和CodedSecAgg：联合学习中的掉队缓解和安全聚合链接：https://arxiv.org/abs/2112.08909

作者：Reent Schlegel,Siddhartha Kumar,Eirik Rosnes,Alexandre Graell i Amat 备注：12 pages, 7 figures, this work has been submitted to the IEEE for possible publication 摘要：我们提出了两种新的线性回归编码联邦学习（FL）方案，以减轻掉队设备的影响。第一种方案CODEDPADEDFL在保持传统FL隐私水平的同时减轻了散乱设备的影响。特别是，它将用户数据隐私的一次性填充与梯度码结合起来，以产生对散乱设备的弹性。为了对真实数据应用一次性填充，我们的方案利用了数据的定点算术表示。对于具有25台设备的场景，与传统FL相比，CodedPaddedFL在MMIST和Fashion MNIST数据集上分别达到了95%和85%的精度，加速系数分别为6.6和9.2。此外，与Prakashemph{et al.}最近提出的方案相比，它在延迟方面产生了类似的性能，并且没有额外泄漏私有数据的缺点。第二个方案CodedSecAgg基于Shamir的秘密共享，提供了抗模型反转攻击的散乱恢复能力和鲁棒性。CodedSecAgg在MNIST数据集上的性能优于最先进的安全聚合方案，如LightSecAgg，其加速系数为6.6—14.6，具体取决于共谋设备的数量，对于具有120台设备的场景，与CodedPaddedFL相比，延迟增加了30%。摘要：We present two novel coded federated learning (FL) schemes for linear regression that mitigate the effect of straggling devices. The first scheme, CodedPaddedFL, mitigates the effect of straggling devices while retaining the privacy level of conventional FL. Particularly, it combines one-time padding for user data privacy with gradient codes to yield resiliency against straggling devices. To apply one-time padding to real data, our scheme exploits a fixed-point arithmetic representation of the data. For a scenario with 25 devices, CodedPaddedFL achieves a speed-up factor of 6.6 and 9.2 for an accuracy of 95\% and 85\% on the MMIST and Fashion-MNIST datasets, respectively, compared to conventional FL. Furthermore, it yields similar performance in terms of latency compared to a recently proposed scheme by Prakash emph{et al.} without the shortcoming of additional leakage of private data. The second scheme, CodedSecAgg, provides straggler resiliency and robustness against model inversion attacks and is based on Shamir's secret sharing. CodedSecAgg outperforms state-of-the-art secure aggregation schemes such as LightSecAgg by a speed-up factor of 6.6--14.6, depending on the number of colluding devices, on the MNIST dataset for a scenario with 120 devices, at the expense of a 30\% increase in latency compared to CodedPaddedFL.

【2】 Data Valuation for Vertical Federated Learning: An Information-Theoretic Approach 标题：垂直联合学习的数据评估：信息论方法链接：https://arxiv.org/abs/2112.08364

作者：Xiao Han,Leye Wang,Junjie Wu 摘要：联邦学习（FL）是一种很有前途的机器学习范式，它以保护隐私和法律监管的方式为真实世界的AI应用程序实现跨党派数据协作。如何评估缔约方的数据是一个关键但具有挑战性的问题。在文献中，数据评估要么依赖于为给定任务运行特定模型，要么只是与任务无关；然而，在FL模型尚未确定的情况下，通常需要根据特定任务选择参与方。因此，这项工作填补了这一空白，并就我们所知，提出了第一种针对垂直FL任务的隐私保护、任务特定但无模型的数据评估方法。具体而言，FedValue采用了一种称为Shapley CMI的新的信息论指标，从博弈论的角度评估多方的数据值。此外，设计了一种新的服务器辅助联邦计算机制来计算Shapley CMI，同时保护各方免受数据泄漏。我们还提出了几种在实际中加速Shapley CMI计算的技术。在六个开放数据集上的大量实验验证了FedValue在垂直FL任务数据评估中的有效性和效率。特别是，Shapley CMI作为一种无模型度量，其性能与依赖于运行一组性能良好的模型的度量相当。摘要：Federated learning (FL) is a promising machine learning paradigm that enables cross-party data collaboration for real-world AI applications in a privacy-preserving and law-regulated way. How to valuate parties' data is a critical but challenging FL issue. In the literature, data valuation either relies on running specific models for a given task or is just task irrelevant; however, it is often requisite for party selection given a specific task when FL models have not been determined yet. This work thus fills the gap and proposes emph{FedValue}, to our best knowledge, the first privacy-preserving, task-specific but model-free data valuation method for vertical FL tasks. Specifically, FedValue incorporates a novel information-theoretic metric termed Shapley-CMI to assess data values of multiple parties from a game-theoretic perspective. Moreover, a novel server-aided federated computation mechanism is designed to compute Shapley-CMI and meanwhile protects each party from data leakage. We also propose several techniques to accelerate Shapley-CMI computation in practice. Extensive experiments on six open datasets validate the effectiveness and efficiency of FedValue for data valuation of vertical FL tasks. In particular, Shapley-CMI as a model-free metric performs comparably with the measures that depend on running an ensemble of well-performing models.

推理|分析|理解|解释(8篇)

【1】 Human Hands as Probes for Interactive Object Understanding 标题：人的手作为交互式物体理解的探针链接：https://arxiv.org/abs/2112.09120

作者：Mohit Goyal,Sahil Modi,Rishabh Goyal,Saurabh Gupta 备注：Project website at this https URL 摘要：交互式对象理解，或者说我们可以对对象做什么，以及如何做，是计算机视觉的一个长期目标。在本文中，我们通过在以自我为中心的视频中观察人类的手来解决这个问题。我们证明了观察人的手与什么相互作用以及如何提供相关数据和必要的监督。注意手，容易定位和稳定活动对象进行学习，并揭示与对象发生交互的位置。通过分析手，我们可以了解我们可以对物体做什么，以及如何处理。我们将这些基本原则应用于EPIC-KITCHENS数据集，并通过观察以自我为中心的视频中的手，成功地学习了状态敏感特征和对象启示（交互区域和提供的抓握）。摘要：Interactive object understanding, or what we can do to objects and how is a long-standing goal of computer vision. In this paper, we tackle this problem through observation of human hands in in-the-wild egocentric videos. We demonstrate that observation of what human hands interact with and how can provide both the relevant data and the necessary supervision. Attending to hands, readily localizes and stabilizes active objects for learning and reveals places where interactions with objects occur. Analyzing the hands shows what we can do to objects and how. We apply these basic principles on the EPIC-KITCHENS dataset, and successfully learn state-sensitive features, and object affordances (regions of interaction and afforded grasps), purely by observing hands in egocentric videos.

【2】 Non-Gaussian Component Analysis via Lattice Basis Reduction 标题：基于格基约简的非高斯分量分析链接：https://arxiv.org/abs/2112.09104

作者：Ilias Diakonikolas,Daniel M. Kane 摘要：非高斯成分分析（NGCA）是以下分布学习问题：给定$mathbb{R}^d$上分布的i.i.d.样本，在隐藏方向$v$上为非高斯分布，在正交方向上为独立标准高斯分布，目标是近似隐藏方向$v$。先前的工作{DKS17 sq}提供了正式证据，证明在一元非高斯分布$A$的适当矩匹配条件下，NGCA存在信息计算权衡。当分配$A$是离散的时，后一个结果不适用。一个自然的问题是，在这种情况下，信息计算权衡是否会持续。在本文中，我们通过在明确定义的技术意义上，在$a$是离散的或近似离散的情况下，获得NGCA的样本和计算效率高的算法来否定这个问题。我们算法中使用的关键工具是LLL方法cite{LLL82}用于格基约简。摘要：Non-Gaussian Component Analysis (NGCA) is the following distribution learning problem: Given i.i.d. samples from a distribution on $mathbb{R}^d$ that is non-gaussian in a hidden direction $v$ and an independent standard Gaussian in the orthogonal directions, the goal is to approximate the hidden direction $v$. Prior work cite{DKS17-sq} provided formal evidence for the existence of an information-computation tradeoff for NGCA under appropriate moment-matching conditions on the univariate non-gaussian distribution $A$. The latter result does not apply when the distribution $A$ is discrete. A natural question is whether information-computation tradeoffs persist in this setting. In this paper, we answer this question in the negative by obtaining a sample and computationally efficient algorithm for NGCA in the regime that $A$ is discrete or nearly discrete, in a well-defined technical sense. The key tool leveraged in our algorithm is the LLL method cite{LLL82} for lattice basis reduction.

【3】 Understanding Memorization from the Perspective of Optimization via Efficient Influence Estimation 标题：通过有效影响估计从优化的角度理解记忆链接：https://arxiv.org/abs/2112.08798

作者：Futong Liu,Tao Lin,Martin Jaggi 摘要：过度参数化的深度神经网络能够在保持较小的泛化误差的同时获得良好的训练精度。人们还发现，它们能够适应任意的标签，这种行为被称为记忆现象。在这项工作中，我们研究了具有真实标签的数据（真实数据）和具有随机标签的数据（随机数据）的翻转丢失记忆现象，这是一种估计影响和记忆的有效方法。我们的主要发现是：（i）对于真实数据和随机数据，网络同时进行简单示例（如真实数据）和困难示例（如随机数据）的优化，简单示例的优化速度更快；（ii）对于真实数据，训练数据集中正确的困难示例比简单示例更能提供信息。通过展示随机数据和真实数据的记忆存在性，我们强调了它们在优化方面的一致性，并强调了优化过程中记忆的含义。摘要：Over-parameterized deep neural networks are able to achieve excellent training accuracy while maintaining a small generalization error. It has also been found that they are able to fit arbitrary labels, and this behaviour is referred to as the phenomenon of memorization. In this work, we study the phenomenon of memorization with turn-over dropout, an efficient method to estimate influence and memorization, for data with true labels (real data) and data with random labels (random data). Our main findings are: (i) For both real data and random data, the optimization of easy examples (e.g., real data) and difficult examples (e.g., random data) are conducted by the network simultaneously, with easy ones at a higher speed; (ii) For real data, a correct difficult example in the training dataset is more informative than an easy one. By showing the existence of memorization on random data and real data, we highlight the consistency between them regarding optimization and we emphasize the implication of memorization during optimization.

【4】 Invariance Through Inference 标题：推论不变性链接：https://arxiv.org/abs/2112.08526

作者：Takuma Yoneda,Ge Yang,Matthew R. Walter,Bradly Stadie 备注：In submission to ICLR2022. Here's our project page: this https URL 摘要：我们介绍了一种称为推理不变性的通用方法，用于在感知变化未知的部署环境中提高代理的测试时性能。与通过插值产生不变的视觉特征不同，通过推理产生的不变性将部署时的自适应转化为无监督学习问题。这在实践中是通过部署一个简单的算法来实现的，该算法尝试将潜在特征的分布与代理的先前经验相匹配，而不依赖于成对的数据。虽然很简单，但我们表明，这一想法可以在各种适应场景中带来令人惊讶的改进，而无需获得部署时间奖励，包括改变相机姿势和照明条件。结果显示在具有挑战性的干扰物控制套件上，这是一个基于图像观察的机器人环境。摘要：We introduce a general approach, called Invariance through Inference, for improving the test-time performance of an agent in deployment environments with unknown perceptual variations. Instead of producing invariant visual features through interpolation, invariance through inference turns adaptation at deployment-time into an unsupervised learning problem. This is achieved in practice by deploying a straightforward algorithm that tries to match the distribution of latent features to the agent's prior experience, without relying on paired data. Although simple, we show that this idea leads to surprising improvements on a variety of adaptation scenarios without access to deployment-time rewards, including changes in camera poses and lighting conditions. Results are presented on challenging distractor control suite, a robotics environment with image-based observations.

【5】 Towards Explainable Artificial Intelligence in Banking and Financial Services 标题：走向银行和金融服务中的可解释人工智能链接：https://arxiv.org/abs/2112.08441

作者：Ambreen Hanif 摘要：人工智能（AI）使机器能够从人类经验中学习，适应新的输入，并执行类似于人类的任务。人工智能发展迅速，正在改变企业运营方式，从流程自动化到任务认知增强和智能流程/数据分析。然而，人类用户面临的主要挑战是理解并适当信任人工智能算法和方法的结果。在本文中，为了应对这一挑战，我们研究和分析了最近在可解释人工智能（XAI）方法和工具方面所做的工作。我们介绍了一种新的XAI过程，它有助于生成可解释的模型，同时保持高水平的学习性能。我们提出了一种交互式的基于证据的方法来帮助人类用户理解和信任人工智能算法产生的结果和输出。我们采用银行领域的典型场景来分析客户交易。我们开发了一个数字仪表盘，以便于与算法结果进行交互，并讨论了拟议的XAI方法如何显著提高数据科学家理解AI算法结果的信心。摘要：Artificial intelligence (AI) enables machines to learn from human experience, adjust to new inputs, and perform human-like tasks. AI is progressing rapidly and is transforming the way businesses operate, from process automation to cognitive augmentation of tasks and intelligent process/data analytics. However, the main challenge for human users would be to understand and appropriately trust the result of AI algorithms and methods. In this paper, to address this challenge, we study and analyze the recent work done in Explainable Artificial Intelligence (XAI) methods and tools. We introduce a novel XAI process, which facilitates producing explainable models while maintaining a high level of learning performance. We present an interactive evidence-based approach to assist human users in comprehending and trusting the results and output created by AI-enabled algorithms. We adopt a typical scenario in the Banking domain for analyzing customer transactions. We develop a digital dashboard to facilitate interacting with the algorithm results and discuss how the proposed XAI method can significantly improve the confidence of data scientists in understanding the result of AI-enabled algorithms.

【6】 Generalization Bounds for Stochastic Gradient Langevin Dynamics: A Unified View via Information Leakage Analysis 标题：随机梯度朗之万动力学的广义界：基于信息泄漏分析的统一观点链接：https://arxiv.org/abs/2112.08439

作者：Bingzhe Wu,Zhicong Liang,Yatao Bian,ChaoChao Chen,Junzhou Huang,Yuan Yao 摘要：最近，利用随机梯度朗之万动力学（SGLD）对非凸经验风险最小化范式的推广界进行了广泛的研究。人们从不同的角度提出了一些理论框架来研究这个问题，如信息论和稳定性。在本文中，我们从隐私泄漏分析中提出了一个统一的观点来研究SGLD的泛化边界，并提供了一个理论框架，以简洁的方式重新推导先前的结果。除了理论发现之外，我们还进行了各种数值研究来实证评估SGLD的信息泄漏问题。此外，我们的理论和实证结果为先前研究SGLD成员隐私的工作提供了解释。摘要：Recently, generalization bounds of the non-convex empirical risk minimization paradigm using Stochastic Gradient Langevin Dynamics (SGLD) have been extensively studied. Several theoretical frameworks have been presented to study this problem from different perspectives, such as information theory and stability. In this paper, we present a unified view from privacy leakage analysis to investigate the generalization bounds of SGLD, along with a theoretical framework for re-deriving previous results in a succinct manner. Aside from theoretical findings, we conduct various numerical studies to empirically assess the information leakage issue of SGLD. Additionally, our theoretical and empirical results provide explanations for prior works that study the membership privacy of SGLD.

【7】 BayesFlow can reliably detect Model Misspecification and Posterior Errors in Amortized Bayesian Inference 标题：BayesFlow可以可靠地检测摊销贝叶斯推理中的模型错误和后验误差链接：https://arxiv.org/abs/2112.08866

作者：Marvin Schmitt,Paul-Christian Bürkner,Ullrich Köthe,Stefan T. Radev 备注：14 pages, 7 figures 摘要：神经密度估计器在不同的研究领域中被证明在执行基于模拟的贝叶斯推理方面非常强大。特别是，BayesFlow框架使用两步方法，在模拟程序隐式定义似然函数的情况下，实现摊销参数估计。但是，当模拟不能很好地反映现实时，这种推断有多可靠呢？在本文中，我们概念化了基于仿真的推理中出现的模型错误指定的类型，并系统地研究了BayesFlow框架在这些错误指定下的性能。我们提出了一个增广优化目标，该目标在潜在数据空间上施加概率结构，并利用最大平均差异（MMD）来检测推理过程中潜在的灾难性错误，从而破坏所获得结果的有效性。我们根据大量人工和现实的错误说明验证了我们的检测标准，从玩具共轭模型到应用于真实数据的决策和疾病爆发动力学的复杂模型。此外，我们还表明，后验推理误差随着真实数据生成分布与潜在摘要空间中典型模拟集之间的距离的增加而增加。因此，我们证明了MMD作为一种检测模型错误指定的方法和作为一种验证摊销贝叶斯推理可信度的代理的双重效用。摘要：Neural density estimators have proven remarkably powerful in performing efficient simulation-based Bayesian inference in various research domains. In particular, the BayesFlow framework uses a two-step approach to enable amortized parameter estimation in settings where the likelihood function is implicitly defined by a simulation program. But how faithful is such inference when simulations are poor representations of reality? In this paper, we conceptualize the types of model misspecification arising in simulation-based inference and systematically investigate the performance of the BayesFlow framework under these misspecifications. We propose an augmented optimization objective which imposes a probabilistic structure on the latent data space and utilize maximum mean discrepancy (MMD) to detect potentially catastrophic misspecifications during inference undermining the validity of the obtained results. We verify our detection criterion on a number of artificial and realistic misspecifications, ranging from toy conjugate models to complex models of decision making and disease outbreak dynamics applied to real data. Further, we show that posterior inference errors increase as a function of the distance between the true data-generating distribution and the typical set of simulations in the latent summary space. Thus, we demonstrate the dual utility of MMD as a method for detecting model misspecification and as a proxy for verifying the faithfulness of amortized Bayesian inference.

【8】 Explainable Natural Language Processing with Matrix Product States 标题：基于矩阵乘积状态的可解释自然语言处理链接：https://arxiv.org/abs/2112.08628

作者：Jirawat Tangpanitanon,Chanatip Mangkang,Pradeep Bhadola,Yuichiro Minato,Dimitris Angelakis,Thiparat Chotibut 备注：25 pages, 7 figures 摘要：尽管递归神经网络（RNN）在自然语言处理（NLP）中取得了经验上的成功，但由于RNN中固有的复杂计算，对RNN的理论理解仍然有限。我们通过一类称为递归算术电路（RAC）的RNN和矩阵乘积状态（MPS）之间的映射，对普遍存在的NLP任务（电影评论的情绪分析）中RNN的行为进行系统分析。使用冯·诺依曼纠缠熵（EE）作为信息传播的代理，我们证明了单层RAC具有最大的信息传播能力，这可以通过EE的饱和来反映。将MPS的键维数扩大到EE饱和阈值之外不会提高预测精度，因此可以构建一个最能估计数据统计信息的最小模型。虽然饱和EE小于MPS面积定律所能达到的最大EE，但我们的模型在现实情绪分析数据集中达到了99%的训练精度。因此，仅低EE并不是反对NLP采用单层RAC的理由。与远程信息传播是RNN表达能力的主要来源这一普遍观点相反，我们发现单层RAC还利用有意义的词向量嵌入的高表达能力。我们的工作使用多体量子物理的工具，揭示了RACs中学习的现象学，更广泛地揭示了NLP中RNN的可解释性方面。摘要：Despite empirical successes of recurrent neural networks (RNNs) in natural language processing (NLP), theoretical understanding of RNNs is still limited due to intrinsically complex computations in RNNs. We perform a systematic analysis of RNNs' behaviors in a ubiquitous NLP task, the sentiment analysis of movie reviews, via the mapping between a class of RNNs called recurrent arithmetic circuits (RACs) and a matrix product state (MPS). Using the von-Neumann entanglement entropy (EE) as a proxy for information propagation, we show that single-layer RACs possess a maximum information propagation capacity, reflected by the saturation of the EE. Enlarging the bond dimension of an MPS beyond the EE saturation threshold does not increase the prediction accuracies, so a minimal model that best estimates the data statistics can be constructed. Although the saturated EE is smaller than the maximum EE achievable by the area law of an MPS, our model achieves ~99% training accuracies in realistic sentiment analysis data sets. Thus, low EE alone is not a warrant against the adoption of single-layer RACs for NLP. Contrary to a common belief that long-range information propagation is the main source of RNNs' expressiveness, we show that single-layer RACs also harness high expressiveness from meaningful word vector embeddings. Our work sheds light on the phenomenology of learning in RACs and more generally on the explainability aspects of RNNs for NLP, using tools from many-body quantum physics.

检测相关(3篇)

【1】 A Static Analyzer for Detecting Tensor Shape Errors in Deep Neural Network Training Code 标题：用于深层神经网络训练码张量形状误差检测的静电分析器链接：https://arxiv.org/abs/2112.09037

作者：Ho Young Jhoo,Sehoon Kim,Woosung Song,Kyuyeon Park,DongKwon Lee,Kwangkeun Yi 摘要：我们提出了一种自动静态分析器PyTea，它可以检测PyTorch代码中的张量形状错误。张量形状误差是深层神经网络编码的关键；一旦张量形状不匹配在训练阶段发生，许多训练成本和中间结果将丢失。给定输入PyTorch源，PyTea静态跟踪每个可能的执行路径，收集路径的张量操作序列所需的张量形状约束，并确定约束是否不可满足（因此可能发生形状错误）。PyTea的可伸缩性和精度取决于现实世界PyTorch应用程序的特征：PyTea保守修剪后的执行路径数量很少爆炸，循环非常简单，可以由我们的符号抽象限定。我们针对官方PyTorch存储库中的项目和StackOverflow中质疑的一些张量错误代码测试了PyTea。PyTea在几秒钟内成功检测到这些代码中的张量形状错误。摘要：We present an automatic static analyzer PyTea that detects tensor-shape errors in PyTorch code. The tensor-shape error is critical in the deep neural net code; much of the training cost and intermediate results are to be lost once a tensor shape mismatch occurs in the midst of the training phase. Given the input PyTorch source, PyTea statically traces every possible execution path, collects tensor shape constraints required by the tensor operation sequence of the path, and decides if the constraints are unsatisfiable (hence a shape error can occur). PyTea's scalability and precision hinges on the characteristics of real-world PyTorch applications: the number of execution paths after PyTea's conservative pruning rarely explodes and loops are simple enough to be circumscribed by our symbolic abstraction. We tested PyTea against the projects in the official PyTorch repository and some tensor-error code questioned in the StackOverflow. PyTea successfully detects tensor shape errors in these codes, each within a few seconds.

【2】 Utilizing XAI technique to improve autoencoder based model for computer network anomaly detection with shapley additive explanation(SHAP) 标题：利用XAI技术改进基于自动编码器的Shapley加性解释(Shap)计算机网络异常检测模型链接：https://arxiv.org/abs/2112.08442

作者：Khushnaseeb Roshan,Aasim Zafar 备注：None 摘要：机器学习（ML）和深度学习（DL）方法被迅速采用，尤其是在计算机网络安全领域，如欺诈检测、网络异常检测、入侵检测等。然而，基于ML和DL的模型缺乏透明度是它们实现的一个主要障碍，并且由于其黑盒性质而受到批评，即使有如此巨大的结果。可解释人工智能（XAI）是一个很有前途的领域，它可以通过解释和解释模型的输出来提高模型的可信度。如果基于ML和DL的模型的内部工作是可以理解的，那么它可以进一步帮助改进其性能。本文的目的是展示如何使用XAI来解释DL模型的结果，在本例中是自动编码器。并在此基础上，改进了其在计算机网络异常检测中的性能。基于shapley值的核SHAP方法是一种新的特征选择技术。此方法仅用于识别实际导致攻击/异常实例集异常行为的特征。之后，这些特征集用于训练和验证自动编码器，但仅用于良性数据。最后，构建的SHAP_模型优于基于特征选择方法提出的其他两个模型。整个实验是在最新的CICIDS2017网络数据集的子集上进行的。SHAP_模型的总体准确度和AUC分别为94%和0.969。摘要：Machine learning (ML) and Deep Learning (DL) methods are being adopted rapidly, especially in computer network security, such as fraud detection, network anomaly detection, intrusion detection, and much more. However, the lack of transparency of ML and DL based models is a major obstacle to their implementation and criticized due to its black-box nature, even with such tremendous results. Explainable Artificial Intelligence (XAI) is a promising area that can improve the trustworthiness of these models by giving explanations and interpreting its output. If the internal working of the ML and DL based models is understandable, then it can further help to improve its performance. The objective of this paper is to show that how XAI can be used to interpret the results of the DL model, the autoencoder in this case. And, based on the interpretation, we improved its performance for computer network anomaly detection. The kernel SHAP method, which is based on the shapley values, is used as a novel feature selection technique. This method is used to identify only those features that are actually causing the anomalous behaviour of the set of attack/anomaly instances. Later, these feature sets are used to train and validate the autoencoder but on benign data only. Finally, the built SHAP_Model outperformed the other two models proposed based on the feature selection method. This whole experiment is conducted on the subset of the latest CICIDS2017 network dataset. The overall accuracy and AUC of SHAP_Model is 94% and 0.969, respectively.

【3】 Real-time Detection of Anomalies in Multivariate Time Series of Astronomical Data 标题：天文数据多变量时间序列异常的实时检测链接：https://arxiv.org/abs/2112.08415

作者：Daniel Muthukrishna,Kaisey S. Mandel,Michelle Lochner,Sara Webb,Gautham Narayan 备注：9 pages, 5 figures, Accepted at the NeurIPS 2021 workshop on Machine Learning and the Physical Sciences 摘要：天文瞬变是指在不同的时间尺度上暂时变亮的恒星物体，它导致了宇宙学和天文学中一些最重要的发现。其中一些瞬变是被称为超新星的恒星爆炸性死亡，而另一些是罕见的、奇异的或全新的令人兴奋的恒星爆炸。新的天文天象观测正在观测数量空前的多波长瞬变，使得视觉识别新的有趣瞬变的标准方法变得不可行。为了满足这一需求，我们提出了两种新的方法，旨在快速、自动地实时检测异常瞬态光曲线。这两种方法都基于一个简单的想法，即如果已知瞬变总体的光照曲线可以精确建模，那么与模型预测的任何偏差都可能是异常。第一种方法是使用时间卷积网络（TCN）构建的概率神经网络，第二种方法是瞬态的可解释贝叶斯参数模型。我们表明，与我们的参数模型相比，神经网络的灵活性（使其成为许多回归任务的强大工具的属性）使其不适合异常检测。摘要：Astronomical transients are stellar objects that become temporarily brighter on various timescales and have led to some of the most significant discoveries in cosmology and astronomy. Some of these transients are the explosive deaths of stars known as supernovae while others are rare, exotic, or entirely new kinds of exciting stellar explosions. New astronomical sky surveys are observing unprecedented numbers of multi-wavelength transients, making standard approaches of visually identifying new and interesting transients infeasible. To meet this demand, we present two novel methods that aim to quickly and automatically detect anomalous transient light curves in real-time. Both methods are based on the simple idea that if the light curves from a known population of transients can be accurately modelled, any deviations from model predictions are likely anomalies. The first approach is a probabilistic neural network built using Temporal Convolutional Networks (TCNs) and the second is an interpretable Bayesian parametric model of a transient. We show that the flexibility of neural networks, the attribute that makes them such a powerful tool for many regression tasks, is what makes them less suitable for anomaly detection when compared with our parametric model.

分类|识别(2篇)

【1】 A CNN based method for Sub-pixel Urban Land Cover Classification using Landsat-5 TM and Resourcesat-1 LISS-IV Imagery 标题：基于CNN的Landsat-5TM和Resourcesat-1 LISS-IV影像城市土地覆盖亚像素分类方法链接：https://arxiv.org/abs/2112.08841

作者：Krishna Kumar Perikamana,Krishnachandran Balakrishnan,Pratyush Tripathy 备注：29 pages, 14 figures (including appendix), 8 tables (including appendix) 摘要：城市土地覆盖的时间序列数据在分析城市增长模式、不透水表面和植被分布的变化及其对城市微气候的影响方面具有重要的应用价值。由于自由图像的时间序列较长，陆地卫星数据非常适合进行此类分析，但传统的每像素硬分类无法充分发挥陆地卫星数据的潜力。本文提出了一种利用Landsat-5tm和Resourcesat-1liss-IV传感器时间重叠的亚像素分类方法。我们训练了一个卷积神经网络，从30米的Landsat-5 TM数据预测部分土地覆盖图。参考土地覆盖率是根据2011年班加罗尔580万LISS-IV硬分类图像估算的。此外，我们使用2009年孟买的数据证明了该模型的通用性和优越性能，并将其与使用随机森林分类器获得的结果进行了比较。对于班加鲁（2011年）和孟买（2009年）的数据，我们的CNN模型在30m单元水平上的建筑物和植被比例预测的平均绝对百分比误差在7.2到11.3之间。与最近使用有限空间范围内的数据进行验证的研究不同，我们的模型已经使用两个特大城市在两个不同时间段内的完整空间范围的数据进行了训练和验证。因此，它可以从Landsat-5 TM时间序列数据可靠地生成30米的建筑和植被比例图，以分析长期城市增长模式。摘要：Time series data of urban land cover is of great utility in analyzing urban growth patterns, changes in distribution of impervious surface and vegetation and resulting impacts on urban micro climate. While Landsat data is ideal for such analysis due to the long time series of free imagery, traditional per-pixel hard classification fails to yield full potential of the Landsat data. This paper proposes a sub-pixel classification method that leverages the temporal overlap of Landsat-5 TM and Resourcesat-1 LISS-IV sensors. We train a convolutional neural network to predict fractional land cover maps from 30m Landsat-5 TM data. The reference land cover fractions are estimated from a hard-classified 5.8m LISS-IV image for Bengaluru from 2011. Further, we demonstrate the generalizability and superior performance of the proposed model using data for Mumbai from 2009 and comparing it to the results obtained using a Random Forest classifier. For both Bengaluru (2011) and Mumbai (2009) data, Mean Absolute Percentage Error of our CNN model is in the range of 7.2 to 11.3 for both built-up and vegetation fraction prediction at the 30m cell level. Unlike most recent studies where validation is conducted using data for a limited spatial extent, our model has been trained and validated using data for the complete spatial extent of two mega cities for two different time periods. Hence it can reliably generate 30m built-up and vegetation fraction maps from Landsat-5 TM time series data to analyze long term urban growth patterns.

【2】 Classification Under Ambiguity: When Is Average-K Better Than Top-K? 标题：歧义下的分类：Average-K何时优于Top-K？链接：https://arxiv.org/abs/2112.08851

作者：Titouan Lorieul,Alexis Joly,Dennis Shasha 备注：53 pages, 21 figures 摘要：当可能有多个标签时，选择单个标签可能会导致精度低。一个常见的替代方法，称为top-$K$分类，是选择一些数字$K$（通常约为5），并返回得分最高的$K$标签。不幸的是，对于明确的情况，$K>1$太多，对于非常模糊的情况，$Kleq 5$（例如）可能太小。另一种明智的策略是使用自适应方法，其中返回的标签数量随计算模糊度的函数而变化，但必须在所有样本上平均到某个特定的$K$。我们表示该替代平均值-$K$分类。本文正式描述了当平均$K$分类比固定顶部$K$分类能够获得更低的错误率时的模糊度分布。此外，它为固定大小和自适应分类器提供了自然的估计过程，并证明了它们的一致性。最后，它报告了对真实世界图像数据集的实验，揭示了在实践中平均$K$分类比最高$K$分类的好处。总的来说，当模糊度被精确地知道时，平均值-$K$永远不会比最高值-$K$差，而且在我们的实验中，当它被估计时，这也成立。摘要：When many labels are possible, choosing a single one can lead to low precision. A common alternative, referred to as top-$K$ classification, is to choose some number $K$ (commonly around 5) and to return the $K$ labels with the highest scores. Unfortunately, for unambiguous cases, $K>1$ is too many and, for very ambiguous cases, $K leq 5$ (for example) can be too small. An alternative sensible strategy is to use an adaptive approach in which the number of labels returned varies as a function of the computed ambiguity, but must average to some particular $K$ over all the samples. We denote this alternative average-$K$ classification. This paper formally characterizes the ambiguity profile when average-$K$ classification can achieve a lower error rate than a fixed top-$K$ classification. Moreover, it provides natural estimation procedures for both the fixed-size and the adaptive classifier and proves their consistency. Finally, it reports experiments on real-world image data sets revealing the benefit of average-$K$ classification over top-$K$ in practice. Overall, when the ambiguity is known precisely, average-$K$ is never worse than top-$K$, and, in our experiments, when it is estimated, this also holds.

表征(2篇)

【1】 Slot-VPS: Object-centric Representation Learning for Video Panoptic Segmentation 标题：Slot-VPS：视频全景分割中的以对象为中心的表示学习链接：https://arxiv.org/abs/2112.08949

作者：Yi Zhou,Hui Zhang,Hana Lee,Shuyang Sun,Pingjun Li,Yangguang Zhu,ByungIn Yoo,Xiaojuan Qi,Jae-Joon Han 摘要：视频全景分割（VPS）的目的是为每个像素分配一个类别标签，在所有帧中唯一地分割和识别所有对象实例。经典解决方案通常将VPS任务分解为多个子任务，并使用多个代理（例如框和遮罩、中心和偏移）来表示对象。然而，这种分而治之的策略需要在空间和时间域进行复杂的后处理，并且容易受到代理任务失败的影响。在本文中，受以对象为中心的学习的启发，我们学习紧凑而健壮的对象表示，我们提出了Slot-VPS，这是该任务的第一个端到端框架。我们对视频中的所有全景实体进行编码，包括前景实例和背景语义，并使用一种称为全景插槽的统一表示。所提出的视频全景检索器将相干时空对象的信息检索并编码到全景时隙中，使其能够以统一的方式定位、分割、区分和关联对象。最后，输出的全景窗口可以直接转换为视频中全景对象的类、掩码和对象ID。我们进行了广泛的消融研究，并在两个基准数据集Cityscapes VPS（ extit{val}和测试集）和VIPER（ extit{val}集）上证明了我们方法的有效性，分别达到了63.7、63.3和56.2 VPQ的最新性能。摘要：Video Panoptic Segmentation (VPS) aims at assigning a class label to each pixel, uniquely segmenting and identifying all object instances consistently across all frames. Classic solutions usually decompose the VPS task into several sub-tasks and utilize multiple surrogates (e.g. boxes and masks, centres and offsets) to represent objects. However, this divide-and-conquer strategy requires complex post-processing in both spatial and temporal domains and is vulnerable to failures from surrogate tasks. In this paper, inspired by object-centric learning which learns compact and robust object representations, we present Slot-VPS, the first end-to-end framework for this task. We encode all panoptic entities in a video, including both foreground instances and background semantics, with a unified representation called panoptic slots. The coherent spatio-temporal object's information is retrieved and encoded into the panoptic slots by the proposed Video Panoptic Retriever, enabling it to localize, segment, differentiate, and associate objects in a unified manner. Finally, the output panoptic slots can be directly converted into the class, mask, and object ID of panoptic objects in the video. We conduct extensive ablation studies and demonstrate the effectiveness of our approach on two benchmark datasets, Cityscapes-VPS ( extit{val} and test sets) and VIPER ( extit{val} set), achieving new state-of-the-art performance of 63.7, 63.3 and 56.2 VPQ, respectively.

【2】 Learning Rich Representation of Keyphrases from Text 标题：从文本中学习关键短语的丰富表示链接：https://arxiv.org/abs/2112.08547

作者：Mayank Kulkarni,Debanjan Mahata,Ravneet Arora,Rajarshi Bhowmik 摘要：在这项工作中，我们探索了如何学习任务特定的语言模型，目的是从文本文档中学习关键短语的丰富表示。我们用不同的掩蔽策略在区分性和生成性环境下对Transformer语言模型（LMs）进行预训练。在区分设置中，我们引入了一个新的预训练目标-关键字短语边界填充替换（KBIR），当使用KBIR预训练的LM针对关键字短语提取任务进行微调时，与SOTA相比，该目标在性能上有很大的提高（F1中高达9.26个点）。在生成环境中，我们为BART-KeyBART引入了一种新的预训练设置，它以CatSeq格式再现与输入文本相关的关键短语，而不是去噪的原始输入。这也导致了业绩的提升（在2010年高达4.33点）F1@M)通过SOTA生成关键短语。此外，我们还对命名实体识别（NER）、问答（QA）、关系提取（RE）、抽象摘要等预先训练的语言模型进行了微调，并取得了与SOTA相当的性能，这表明学习丰富的关键短语表示确实有利于许多其他基本NLP任务。摘要：In this work, we explore how to learn task-specific language models aimed towards learning rich representation of keyphrases from text documents. We experiment with different masking strategies for pre-training transformer language models (LMs) in discriminative as well as generative settings. In the discriminative setting, we introduce a new pre-training objective - Keyphrase Boundary Infilling with Replacement (KBIR), showing large gains in performance (upto 9.26 points in F1) over SOTA, when LM pre-trained using KBIR is fine-tuned for the task of keyphrase extraction. In the generative setting, we introduce a new pre-training setup for BART - KeyBART, that reproduces the keyphrases related to the input text in the CatSeq format, instead of the denoised original input. This also led to gains in performance (upto 4.33 points in F1@M) over SOTA for keyphrase generation. Additionally, we also fine-tune the pre-trained language models on named entity recognition (NER), question answering (QA), relation extraction (RE), abstractive summarization and achieve comparable performance with that of the SOTA, showing that learning rich representation of keyphrases is indeed beneficial for many other fundamental NLP tasks.

优化|敛散性(4篇)

【1】 BoGraph: Structured Bayesian Optimization From Logs for Systems with High-dimensional Parameter Space 标题：BoGraph：高维参数空间系统的日志结构化贝叶斯优化链接：https://arxiv.org/abs/2112.08774

作者：Sami Alabed,Eiko Yoneki 摘要：当前的自动调优框架由于参数空间大、相互依赖性复杂和评估成本高而难以调整计算机系统配置。利用概率模型，结构化贝叶斯优化（SBO）最近克服了这些困难。SBO利用系统专家提供的上下文信息分解参数空间，从而实现快速收敛。然而，建立概率模型的复杂性阻碍了它的广泛应用。我们提出了BoAnon，一个SBO框架，它从日志中学习系统结构。BoAnon提供了一个API，使专家能够将系统知识编码为性能模型或组件依赖关系。博阿农将学习到的结构转化为概率图模型。然后，它将专家提供的知识应用到图形中，以进一步将系统行为上下文化。非概率图允许优化器比其他方法更快地找到有效配置。我们通过硬件架构搜索问题对BoAnon进行评估，与默认架构相比，在能源延迟目标方面实现了5-7美元的x因素改进。凭借其新颖的上下文结构学习管道，BoAnon使SBO可用于其他广泛的计算机系统，如数据库和流处理器。摘要：Current auto-tuning frameworks struggle with tuning computer systems configurations due to their large parameter space, complex interdependencies, and high evaluation cost. Utilizing probabilistic models, Structured Bayesian Optimization (SBO) has recently overcome these difficulties. SBO decomposes the parameter space by utilizing contextual information provided by system experts leading to fast convergence. However, the complexity of building probabilistic models has hindered its wider adoption. We propose BoAnon, a SBO framework that learns the system structure from its logs. BoAnon provides an API enabling experts to encode knowledge of the system as performance models or components dependency. BoAnon takes in the learned structure and transforms it into a probabilistic graph model. Then it applies the expert-provided knowledge to the graph to further contextualize the system behavior. BoAnon probabilistic graph allows the optimizer to find efficient configurations faster than other methods. We evaluate BoAnon via a hardware architecture search problem, achieving an improvement in energy-latency objectives ranging from $5-7$ x-factors improvement over the default architecture. With its novel contextual structure learning pipeline, BoAnon makes using SBO accessible for a wide range of other computer systems such as databases and stream processors.

【2】 Constrained multi-objective optimization of process design parameters in settings with scarce data: an application to adhesive bonding 标题：稀缺数据环境下工艺设计参数的约束多目标优化：在胶粘剂粘接中的应用链接：https://arxiv.org/abs/2112.08760

作者：Alejandro Morales-Hernández,Sebastian Rojas Gonzalez,Inneke Van Nieuwenhuyse,Jeroen Jordens,Maarten Witters,Bart Van Doninck 摘要：胶接接头由于其良好的特性，如高强度重量比、设计灵活性、有限的应力集中、平面力传递、良好的损伤容限和抗疲劳性，在工业中的应用越来越广泛。寻找粘合剂粘合工艺的最佳工艺参数具有挑战性：该优化具有内在的多目标性（旨在最大限度地提高断裂强度，同时最小化成本）和约束性（该过程不应导致材料的任何外观损坏，且应力测试不应导致与附着力相关的故障）。在实验室里进行现实生活中的物理实验很昂贵；传统的进化方法（如遗传算法）不适合解决这个问题，因为评估所需的实验量太大。在本研究中，我们成功地应用了特定的机器学习技术（高斯过程回归和逻辑回归），在有限的实验数据基础上模拟了目标函数和约束函数。这些技术嵌入到贝叶斯优化算法中，该算法以高效的方式成功地检测帕累托最优过程设置（即，需要有限数量的额外实验）。摘要：Adhesive joints are increasingly used in industry for a wide variety of applications because of their favorable characteristics such as high strength-to-weight ratio, design flexibility, limited stress concentrations, planar force transfer, good damage tolerance and fatigue resistance. Finding the optimal process parameters for an adhesive bonding process is challenging: the optimization is inherently multi-objective (aiming to maximize break strength while minimizing cost) and constrained (the process should not result in any visual damage to the materials, and stress tests should not result in failures that are adhesion-related). Real life physical experiments in the lab are expensive to perform; traditional evolutionary approaches (such as genetic algorithms) are then ill-suited to solve the problem, due to the prohibitive amount of experiments required for evaluation. In this research, we successfully applied specific machine learning techniques (Gaussian Process Regression and Logistic Regression) to emulate the objective and constraint functions based on a limited amount of experimental data. The techniques are embedded in a Bayesian optimization algorithm, which succeeds in detecting Pareto-optimal process settings in a highly efficient way (i.e., requiring a limited number of extra experiments).

【3】 Predictive Price-Performance Optimization for Serverless Query Processing 标题：面向无服务器查询处理的预测性性价比优化链接：https://arxiv.org/abs/2112.08572

作者：Rathijit Sen,Abhishek Roy,Alekh Jindal 摘要：我们为预测性资源分配提供了一个高效的参数化建模框架，重点关注计算资源的数量，该框架可以为无服务器查询处理设置中的数据分析优化一系列性价比目标。我们深入讨论并评估了我们的系统AutoExecutor如何使用此框架为Azure Synapse上运行的Spark SQL查询自动选择接近最佳的执行器和核心计数。我们的技术在Spark的内置、反应式、动态执行器分配功能的基础上进行了改进，在运行查询时大大减少了已分配的执行器总数和执行器占用率，从而释放了可能被其他并发查询使用的执行器，或减少了总体群集资源调配需求。与执行后分析工具（如Sparklens）相比，我们在执行查询之前预测查询的资源分配，还可以考虑输入数据大小的变化，以预测所需的分配。摘要：We present an efficient, parametric modeling framework for predictive resource allocations, focusing on the amount of computational resources, that can optimize for a range of price-performance objectives for data analytics in serverless query processing settings. We discuss and evaluate in depth how our system, AutoExecutor, can use this framework to automatically select near-optimal executor and core counts for Spark SQL queries running on Azure Synapse. Our techniques improve upon Spark's in-built, reactive, dynamic executor allocation capabilities by substantially reducing the total executors allocated and executor occupancy while running queries, thereby freeing up executors that can potentially be used by other concurrent queries or in reducing the overall cluster provisioning needs. In contrast with post-execution analysis tools such as Sparklens, we predict resource allocations for queries before executing them and can also account for changes in input data sizes for predicting the desired allocations.

【4】 OptABC: an Optimal Hyperparameter Tuning Approach for Machine Learning Algorithms 标题：OptABC：一种机器学习算法的最优超参数整定方法链接：https://arxiv.org/abs/2112.08511

作者：Leila Zahedi,Farid Ghareh Mohammadi,M. Hadi Amini 备注：8 pages 摘要：机器学习算法中的超参数整定是一项具有计算挑战性的任务，因为该问题的规模很大。为了开发一种高效的超参数调整策略，一个很有希望的解决方案是使用群体智能算法。人工蜂群优化算法是一种很有前途的高效优化算法。然而，在某些情况下，由于解的初始总体较差和目标函数昂贵，ABC可能会遇到收敛速度慢或执行时间长的问题。为了解决这些问题，提出了一种新的算法OptABC，以帮助ABC算法更快地收敛到近似最优解。OptABC集成了人工蜂群算法、K-均值聚类、贪婪算法和基于对立的学习策略，用于调整不同机器学习模型的超参数。OptABC采用这些技术试图使初始种群多样化，从而在不显著降低精度的情况下增强收敛能力。为了验证所提出的方法的性能，我们将结果与以前最先进的方法进行了比较。实验结果表明，与文献中现有的方法相比，OptABC是有效的。摘要：Hyperparameter tuning in machine learning algorithms is a computationally challenging task due to the large-scale nature of the problem. In order to develop an efficient strategy for hyper-parameter tuning, one promising solution is to use swarm intelligence algorithms. Artificial Bee Colony (ABC) optimization lends itself as a promising and efficient optimization algorithm for this purpose. However, in some cases, ABC can suffer from a slow convergence rate or execution time due to the poor initial population of solutions and expensive objective functions. To address these concerns, a novel algorithm, OptABC, is proposed to help ABC algorithm in faster convergence toward a near-optimum solution. OptABC integrates artificial bee colony algorithm, K-Means clustering, greedy algorithm, and opposition-based learning strategy for tuning the hyper-parameters of different machine learning models. OptABC employs these techniques in an attempt to diversify the initial population, and hence enhance the convergence ability without significantly decreasing the accuracy. In order to validate the performance of the proposed method, we compare the results with previous state-of-the-art approaches. Experimental results demonstrate the effectiveness of the OptABC compared to existing approaches in the literature.

预测|估计(9篇)

【1】 Forecasting sales with Bayesian networks: a case study of a supermarket product in the presence of promotions 标题：贝叶斯网络在销售预测中的应用--以某超市产品促销为例链接：https://arxiv.org/abs/2112.08706

作者：Muhammad Hamza,Mahdi Abolghasemi,Abraham Oshni Alvandi 摘要：销售预测是供应链中许多管理决策的前提，如生产计划、物料资源计划和预算。促销是最重要的商业策略之一，通常用于促进销售。虽然促销对产生需求很有吸引力，但在促销的情况下，往往很难预测需求。在过去的几十年中，已经开发了几种定量模型来预测销售额，包括统计模型和机器学习模型。然而，这些方法可能不足以说明可能影响销售的所有内部和外部因素。因此，定性模型与定量方法一起被采用，因为咨询专家已被证明通过提供上下文信息来提高预测准确性。此类模型被广泛用于解释可能导致销售快速变化的因素，如促销期间。在本文中，我们的目标是使用贝叶斯网络预测促销销售，其中价格、促销类型和产品位置等因素的组合会影响销售。我们选择开发BN模型，因为BN模型基本上具有将各种定性和定量因素与因果形式相结合的能力，使其成为促销期间销售预测的有吸引力的工具。在本案例研究中，这可用于调整公司的促销策略。我们从在澳大利亚销售产品的零售商处收集特定产品的销售数据。我们为该产品开发了一个贝叶斯网络，并通过实证分析验证了我们的结果。本文证实了BNs可以有效地用于预测销售，尤其是在促销期间。最后，我们为BNs在销售预测中的应用提供了一些研究途径。摘要：Sales forecasting is the prerequisite for a lot of managerial decisions such as production planning, material resource planning and budgeting in the supply chain. Promotions are one of the most important business strategies that are often used to boost sales. While promotions are attractive for generating demand, it is often difficult to forecast demand in their presence. In the past few decades, several quantitative models have been developed to forecast sales including statistical and machine learning models. However, these methods may not be adequate to account for all the internal and external factors that may impact sales. As a result, qualitative models have been adopted along with quantitative methods as consulting experts has been proven to improve forecast accuracy by providing contextual information. Such models are being used extensively to account for factors that can lead to a rapid change in sales, such as during promotions. In this paper, we aim to use Bayesian Networks to forecast promotional sales where a combination of factors such as price, type of promotions, and product location impacts sales. We choose to develop a BN model because BN models essentially have the capability to combine various qualitative and quantitative factors with causal forms, making it an attractive tool for sales forecasting during promotions. This can be used to adjust a company's promotional strategy in the context of this case study. We gather sales data for a particular product from a retailer that sells products in Australia. We develop a Bayesian Network for this product and validate our results by empirical analysis. This paper confirms that BNs can be effectively used to forecast sales, especially during promotions. In the end, we provide some research avenues for using BNs in forecasting sales.

【2】 A Statistics and Deep Learning Hybrid Method for Multivariate Time Series Forecasting and Mortality Modeling 标题：多元时间序列预测与死亡率建模的统计与深度学习混合方法链接：https://arxiv.org/abs/2112.08618

作者：Thabang Mathonsi,Terence L. van Zyl 摘要：混合方法在预测任务和量化这些预测（预测区间）的相关不确定性方面的表现优于纯统计和纯深度学习方法。一个例子是指数平滑递归神经网络（ES-RNN），它是统计预测模型和递归神经网络变体之间的混合。ES-RNN在Makridakis-4预测比赛中的绝对误差提高了9.4%。这一改进和其他混合模型的类似表现主要仅在单变量数据集上得到证明。将混合预测方法应用于多变量数据的困难包括（$i$）不节约模型的超参数调整涉及的高计算成本，（$ii$）与数据固有的自相关性相关的挑战，以及（$iii$）可能难以捕捉的协变量之间的复杂依赖性（互相关）。本文提出了多元指数平滑长短时记忆（MES-LSTM），它是ES-RNN的一个广义多元扩展，克服了这些挑战。MES-LSTM采用矢量化实现。我们在几个2019年聚合冠状病毒病（COVID-19）发病率数据集上测试MES-LSTM，发现我们的混合方法在预测精度和预测区间构建方面比纯统计和深度学习方法有一致、显著的改进。摘要：Hybrid methods have been shown to outperform pure statistical and pure deep learning methods at forecasting tasks and quantifying the associated uncertainty with those forecasts (prediction intervals). One example is Exponential Smoothing Recurrent Neural Network (ES-RNN), a hybrid between a statistical forecasting model and a recurrent neural network variant. ES-RNN achieves a 9.4\% improvement in absolute error in the Makridakis-4 Forecasting Competition. This improvement and similar outperformance from other hybrid models have primarily been demonstrated only on univariate datasets. Difficulties with applying hybrid forecast methods to multivariate data include ($i$) the high computational cost involved in hyperparameter tuning for models that are not parsimonious, ($ii$) challenges associated with auto-correlation inherent in the data, as well as ($iii$) complex dependency (cross-correlation) between the covariates that may be hard to capture. This paper presents Multivariate Exponential Smoothing Long Short Term Memory (MES-LSTM), a generalized multivariate extension to ES-RNN, that overcomes these challenges. MES-LSTM utilizes a vectorized implementation. We test MES-LSTM on several aggregated coronavirus disease of 2019 (COVID-19) morbidity datasets and find our hybrid approach shows consistent, significant improvement over pure statistical and deep learning methods at forecast accuracy and prediction interval construction.

【3】 Masked Measurement Prediction: Learning to Jointly Predict Quantities and Units from Textual Context 标题：蒙版测量预测：学习从文本上下文中联合预测量和单位链接：https://arxiv.org/abs/2112.08616

作者：Daniel Spokoyny,Ivan Lee,Zhao Jin,Taylor Berg-Kirkpatrick 备注：Preprint 摘要：物理测量在学术论文、工程报告和网络表格中占很大一部分。目前的基准未能正确评估预先训练的语言模型在测量方面的计算能力，阻碍了开发新方法并将其应用于数值任务的研究。为此，我们引入了一个新的任务，蒙蔽测量预测（MMP），模型学习在给定蒙蔽文本的情况下重建一个数字及其相关单元。MMP对于训练新的数字信息模型以及评估现有系统的计算能力都是有用的。为了解决这一问题，我们引入了一种新的生成掩蔽测量（GeMM）模型，该模型可以联合学习预测数字及其单位。我们将我们的模型与各种烧蚀和基线进行细粒度分析比较。我们使用传统预训练Transformer模型（RoBERTa）的线性探测来表明它们的性能明显低于联合训练的数字单元模型，突出了这项新任务的难度和我们提出的预训练方法的好处。我们希望这个框架能加速未来建立更强大的数值推理系统的进程。摘要：Physical measurements constitute a large portion of numbers in academic papers, engineering reports, and web tables. Current benchmarks fall short of properly evaluating numeracy of pretrained language models on measurements, hindering research on developing new methods and applying them to numerical tasks. To that end, we introduce a novel task, Masked Measurement Prediction (MMP), where a model learns to reconstruct a number together with its associated unit given masked text. MMP is useful for both training new numerically informed models as well as evaluating numeracy of existing systems. In order to address this task, we introduce a new Generative Masked Measurement (GeMM) model that jointly learns to predict numbers along with their units. We perform fine-grained analyses comparing our model with various ablations and baselines. We use linear probing of traditional pretrained transformer models (RoBERTa) to show that they significantly underperform jointly trained number-unit models, highlighting the difficulty of this new task and the benefits of our proposed pretraining approach. We hope this framework accelerates the progress towards building more robust numerical reasoning systems in the future.

【4】 A prediction-based approach for online dynamic radiotherapy scheduling 标题：一种基于预测的在线动态放射治疗调度方法链接：https://arxiv.org/abs/2112.08549

作者：Tu-San Pham,Antoine Legrain,Patrick De Causmaecker,Louis-Martin Rousseau 摘要：患者调度是一项困难的任务，因为它涉及到处理随机因素，例如未知的患者到达流量。为癌症患者安排放射治疗也面临类似的问题。治愈患者需要在建议的最后期限内开始治疗，即入院后14或28天，同时为需要在入院后1至3天内进行紧急治疗的姑息性患者保留治疗能力。大多数癌症中心通过为急诊病人保留固定数量的治疗时段来解决这个问题。然而，这种单一的预约方式并不理想，可能会导致急诊患者在某些天的治疗过期，而在其他一些天没有充分利用治疗能力，这也会导致治愈患者的治疗延迟。这一问题在拥挤的大型医院尤为严重。在本文中，我们提出了一种基于预测的在线动态放射治疗计划方法。一个离线问题，其中所有未来的病人到达都是已知的提前解决了最优使用整数规划。然后训练回归模型以识别患者到达模式与其理想等待时间之间的联系。然后将经过训练的回归模型嵌入基于预测的方法中，该方法根据患者的特征和日历的当前状态安排患者。数值结果表明，与其他基于扁平预约策略的调度方法相比，我们基于预测的方法有效地防止了急诊患者的逾期治疗，同时保持了良好的等待时间。摘要：Patient scheduling is a difficult task as it involves dealing with stochastic factors such as an unknown arrival flow of patients. Scheduling radiotherapy treatments for cancer patients faces a similar problem. Curative patients need to start their treatment within the recommended deadlines, i.e., 14 or 28 days after their admission while reserving treatment capacity for palliative patients who require urgent treatments within 1 to 3 days after their admission. Most cancer centers solve the problem by reserving a fixed number of treatment slots for emergency patients. However, this flat-reservation approach is not ideal and can cause overdue treatments for emergency patients on some days while not fully exploiting treatment capacity on some other days, which also leads to delaying treatment for curative patients. This problem is especially severe in large and crowded hospitals. In this paper, we propose a prediction-based approach for online dynamic radiotherapy scheduling. An offline problem where all future patient arrivals are known in advance is solved to optimality using Integer Programming. A regression model is then trained to recognize the links between patients' arrival patterns and their ideal waiting time. The trained regression model is then embedded in a prediction-based approach that schedules a patient based on their characteristics and the present state of the calendar. The numerical results show that our prediction-based approach efficiently prevents overdue treatments for emergency patients while maintaining a good waiting time compared to other scheduling approaches based on a flat-reservation policy.

【5】 Predicting Levels of Household Electricity Consumption in Low-Access Settings 标题：低接入环境下的家庭用电量水平预测链接：https://arxiv.org/abs/2112.08497

作者：Simone Fobi,Joel Mugyenyi,Nathaniel J. Williams,Vijay Modi,Jay Taneja 备注：Accepted to be published in Proceedings of IEEE Winter Conference on Applications of Computer Vision (WACV) 2022 摘要：在低收入环境中，电力公司最关键的信息是客户的预期消费。在很大一部分家庭尚未通电的情况下，很难进行用电量评估。在此类设置中，预期消耗量的绝对水平可能在5-100 kWh/月之间，导致这些客户之间的高度可变性。如果低消费群体中有相当一部分人与高消费群体相关联，那么宝贵的资源就岌岌可危。这是第一次在低收入环境下进行此类研究，试图预测建筑物的消费量，而不是总行政区域的消费量。我们使用来自肯尼亚20000个地理参考电力客户（占肯尼亚居民客户的0.01%）的公用事业账单样本，在电气化前的日间卫星图像上训练卷积神经网络（CNN）。这是通过两阶段方法实现的，该方法使用一种新的建筑物分割方法，利用大量的免费卫星图像，最大限度地利用稀缺和昂贵的客户数据。我们的方法表明，可以在建筑水平上实现竞争精度，解决消费变化的挑战。这项工作表明，建筑的特征及其周围环境在预测消费水平方面都很重要。我们还评估了在训练过程中添加低分辨率地理空间数据集的情况，包括夜间灯光和人口普查数据。通过对肯尼亚单个建筑的精细预测，结果已经有助于为选址和分布水平规划提供信息，没有理由不能推广到其他国家。摘要：In low-income settings, the most critical piece of information for electric utilities is the anticipated consumption of a customer. Electricity consumption assessment is difficult to do in settings where a significant fraction of households do not yet have an electricity connection. In such settings the absolute levels of anticipated consumption can range from 5-100 kWh/month, leading to high variability amongst these customers. Precious resources are at stake if a significant fraction of low consumers are connected over those with higher consumption. This is the first study of it's kind in low-income settings that attempts to predict a building's consumption and not that of an aggregate administrative area. We train a Convolutional Neural Network (CNN) over pre-electrification daytime satellite imagery with a sample of utility bills from 20,000 geo-referenced electricity customers in Kenya (0.01% of Kenya's residential customers). This is made possible with a two-stage approach that uses a novel building segmentation approach to leverage much larger volumes of no-cost satellite imagery to make the most of scarce and expensive customer data. Our method shows that competitive accuracies can be achieved at the building level, addressing the challenge of consumption variability. This work shows that the building's characteristics and it's surrounding context are both important in predicting consumption levels. We also evaluate the addition of lower resolution geospatial datasets into the training process, including nighttime lights and census-derived data. The results are already helping inform site selection and distribution-level planning, through granular predictions at the level of individual structures in Kenya and there is no reason this cannot be extended to other countries.

【6】 Event-Aware Multimodal Mobility Nowcasting 标题：事件感知多模式移动性现在广播链接：https://arxiv.org/abs/2112.08443

作者：Zhaonan Wang,Renhe Jiang,Hao Xue,Flora D. Salim,Xuan Song,Ryosuke Shibasaki 备注：Accepted by AAAI 2022 摘要：作为移动即服务（MaaS）成功的决定性部分，人群移动的时空预测建模是一项具有挑战性的任务，特别是考虑到社会事件导致移动行为偏离常态的场景。虽然通过深入学习在高水平时空规律建模方面取得了巨大进展，但大多数（如果不是所有的话）现有方法既不了解多种运输模式之间的动态相互作用，也不适应潜在社会事件带来的前所未有的波动。在本文中，我们从两个角度对规范时空网络（ST-Net）进行了改进：（1）设计一个异构移动信息网络（HMIN），以明确表示多模移动中的多模性；（2）提出了一种内存增强动态滤波器生成器（MDFG），用于在各种场景中动态生成特定于序列的参数。增强的事件感知时空网络，即EAST网络，在多个真实世界数据集上进行评估，这些数据集具有广泛的社会事件种类和覆盖范围。定量和定性实验结果都验证了我们的方法与最新基线相比的优越性。代码和数据发布在https://github.com/underdoc-wang/EAST-Net. 摘要：As a decisive part in the success of Mobility-as-a-Service (MaaS), spatio-temporal predictive modeling for crowd movements is a challenging task particularly considering scenarios where societal events drive mobility behavior deviated from the normality. While tremendous progress has been made to model high-level spatio-temporal regularities with deep learning, most, if not all of the existing methods are neither aware of the dynamic interactions among multiple transport modes nor adaptive to unprecedented volatility brought by potential societal events. In this paper, we are therefore motivated to improve the canonical spatio-temporal network (ST-Net) from two perspectives: (1) design a heterogeneous mobility information network (HMIN) to explicitly represent intermodality in multimodal mobility; (2) propose a memory-augmented dynamic filter generator (MDFG) to generate sequence-specific parameters in an on-the-fly fashion for various scenarios. The enhanced event-aware spatio-temporal network, namely EAST-Net, is evaluated on several real-world datasets with a wide variety and coverage of societal events. Both quantitative and qualitative experimental results verify the superiority of our approach compared with the state-of-the-art baselines. Code and data are published on https://github.com/underdoc-wang/EAST-Net.

【7】 A Deep Learning Based Multitask Network for Respiration Rate Estimation -- A Practical Perspective 标题：一种基于深度学习的多任务呼吸频率估计网络--实用视角链接：https://arxiv.org/abs/2112.09071

作者：Kapil Singh Rathore,Sricharan Vijayarangan,Preejith SP,Mohanasankar Sivaprakasam 备注：A DL based multitasking model to estimate respiratory rate is proposed. Paper is accepted in IEEE HI-POCT 2022 摘要：可穿戴传感器的指数级增长引起了人们对日常活动中生理参数评估的极大兴趣。呼吸速率是生活方式活动绩效评估中使用的重要参数之一。然而，用于测量、运动伪影和其他噪声的突出设置使过程复杂化。本文提出了一种基于深度学习（DL）的多任务体系结构，用于根据ECG和加速计信号估计瞬时和平均呼吸频率，从而在日常生活活动中（如骑自行车、走路、，多任务网络由编码器-解码器和编码器-增量网络组成，用于获取平均呼吸频率和呼吸信号。呼吸信号可用于获得呼吸峰值和瞬时呼吸周期。平均绝对误差（MAE）、均方根误差（RMSE）、推理时间和参数计数分析已用于将网络与当前最先进的机器学习（ML）模型和先前研究中开发的其他DL模型进行比较。作为工作的一部分，还开发了基于各种输入的其他DL配置。在不同的活动中，提议的模型显示出更好的总体准确性，并比个别模式提供更好的结果。摘要：The exponential rise in wearable sensors has garnered significant interest in assessing the physiological parameters during day-to-day activities. Respiration rate is one of the vital parameters used in the performance assessment of lifestyle activities. However, obtrusive setup for measurement, motion artifacts, and other noises complicate the process. This paper presents a multitasking architecture based on Deep Learning (DL) for estimating instantaneous and average respiration rate from ECG and accelerometer signals, such that it performs efficiently under daily living activities like cycling, walking, etc. The multitasking network consists of a combination of Encoder-Decoder and Encoder-IncResNet, to fetch the average respiration rate and the respiration signal. The respiration signal can be leveraged to obtain the breathing peaks and instantaneous breathing cycles. Mean absolute error(MAE), Root mean square error (RMSE), inference time, and parameter count analysis has been used to compare the network with the current state of art Machine Learning (ML) model and other DL models developed in previous studies. Other DL configurations based on a variety of inputs are also developed as a part of the work. The proposed model showed better overall accuracy and gave better results than individual modalities during different activities.

【8】 Estimation of Physical Activity Level and Ambient Condition Thresholds for Respiratory Health using Smartphone Sensors 标题：使用智能手机传感器估算呼吸健康的体力活动水平和环境条件阈值链接：https://arxiv.org/abs/2112.09068

作者：Chinazunwa Uwaoma 摘要：虽然体力活动被描述为慢性病的主要预防措施，但据报道，在不利环境条件下剧烈体力消耗也是慢性呼吸系统疾病恶化的主要原因。通过监测受影响个人的体力活动类型和水平来保持平衡，有助于降低管理呼吸道疾病的成本和负担。本文探讨了智能手机中运动传感器的潜力，以估计可能触发运动诱发呼吸条件（EiRCs）症状的体力活动阈值。重点是从嵌入式运动传感器中提取测量值，以确定个人呼吸健康可耐受的活动水平和活动类型。计算基于信号幅度面积（SMA）和能量消耗（EE）之间的相关性。我们还考虑了环境条件的变化，如温度和湿度的变化，作为体育锻炼过程中呼吸窘迫的因素。从健康个体收集的实时数据用于证明手机作为调节EIRC个体体力活动水平的工具的潜力。我们描述了一个实际情况，实验结果可用于促进良好的呼吸健康。摘要：While physical activity has been described as a primary prevention against chronic diseases, strenuous physical exertion under adverse ambient conditions has also been reported as a major contributor to exacerbation of chronic respiratory conditions. Maintaining a balance by monitoring the type and the level of physical activities of affected individuals, could help in reducing the cost and burden of managing respiratory ailments. This paper explores the potentiality of motion sensors in Smartphones to estimate physical activity thresholds that could trigger symptoms of exercise induced respiratory conditions (EiRCs). The focus is on the extraction of measurements from the embedded motion sensors to determine the activity level and the type of activity that is tolerable to individuals respiratory health. The calculations are based on the correlation between Signal Magnitude Area (SMA) and Energy Expenditure (EE). We also consider the effect of changes in the ambient conditions like temperature and humidity, as contributing factors to respiratory distress during physical exercise. Real time data collected from healthy individuals were used to demonstrate the potentiality of a mobile phone as tool to regulate the level of physical activities of individuals with EiRCs. We describe a practical situation where the experimental outcomes can be applied to promote good respiratory health.

【9】 Simultaneous Multivariate Forecast of Space Weather Indices using Deep Neural Network Ensembles 标题：基于深度神经网络集成的空间天气指数多变量同时预报链接：https://arxiv.org/abs/2112.09051

作者：Bernard Benson,Edward Brown,Stefano Bonasera,Giacomo Acciarini,Jorge A. Pérez-Hernández,Eric Sutton,Moriba K. Jah,Christopher Bridges,Meng Jin,Atılım Güneş Baydin 备注：Fourth Workshop on Machine Learning and the Physical Sciences (NeurIPS 2021) 摘要：太阳射电通量和地磁指数是太阳活动及其影响的重要指标。耀斑和地磁暴等极端太阳事件可能对空间环境产生负面影响，包括低地球轨道上的卫星。因此，预测这些空间天气指数在空间操作和科学中具有重要意义。在这项研究中，我们提出了一个基于长-短期记忆神经网络的模型来学习时间序列数据的分布，能够利用时间序列和太阳图像数据同时提供空间天气指数的多变量27天预测。我们显示，与单独使用时间序列数据相比，将太阳图像数据与时间序列数据合并时，均方根误差提高了30-40%。简单的基线，如持续性和运行平均预测，也与经过训练的深度神经网络模型进行了比较。我们还使用模型集成对预测中的不确定性进行了量化。摘要：Solar radio flux along with geomagnetic indices are important indicators of solar activity and its effects. Extreme solar events such as flares and geomagnetic storms can negatively affect the space environment including satellites in low-Earth orbit. Therefore, forecasting these space weather indices is of great importance in space operations and science. In this study, we propose a model based on long short-term memory neural networks to learn the distribution of time series data with the capability to provide a simultaneous multivariate 27-day forecast of the space weather indices using time series as well as solar image data. We show a 30-40\% improvement of the root mean-square error while including solar image data with time series data compared to using time series data alone. Simple baselines such as a persistence and running average forecasts are also compared with the trained deep neural network models. We also quantify the uncertainty in our prediction using a model ensemble.

其他神经网络|深度学习|模型|建模(22篇)

【1】 Distributed neural network control with dependability guarantees: a compositional port-Hamiltonian approach 标题：具有可靠性保证的分布式神经网络控制：一种组合端口-哈密顿方法链接：https://arxiv.org/abs/2112.09046

作者：Luca Furieri,Clara Lucía Galimberti,Muhammad Zakwan,Giancarlo Ferrari-Trecate 摘要：大规模网络物理系统要求控制策略是分布式的，也就是说，它们只依赖于本地实时测量和与相邻代理的通信。然而，即使在看似简单的情况下，最优分布式控制（ODC）问题也是非常棘手的。因此，最近的工作提出了训练神经网络（NN）分布式控制器。神经网络控制器的一个主要挑战是，它们在训练期间和训练后都不可靠，即闭环系统可能不稳定，并且训练可能由于梯度消失和爆炸而失败。在本文中，我们讨论非线性端口哈密顿（pH）系统网络的这些问题，其建模能力范围从能量系统到非完整载体和化学反应。具体而言，我们利用pH系统的组成特性来表征具有内置闭环稳定性保证的深哈密顿控制策略，而不考虑互连拓扑和所选NN参数。此外，我们的设置能够利用行为良好的神经常微分方程的最新结果，通过设计防止梯度消失的现象。数值实验证实了所提出结构的可靠性，同时与一般神经网络策略的性能相匹配。摘要：Large-scale cyber-physical systems require that control policies are distributed, that is, that they only rely on local real-time measurements and communication with neighboring agents. Optimal Distributed Control (ODC) problems are, however, highly intractable even in seemingly simple cases. Recent work has thus proposed training Neural Network (NN) distributed controllers. A main challenge of NN controllers is that they are not dependable during and after training, that is, the closed-loop system may be unstable, and the training may fail due to vanishing and exploding gradients. In this paper, we address these issues for networks of nonlinear port-Hamiltonian (pH) systems, whose modeling power ranges from energy systems to non-holonomic vehicles and chemical reactions. Specifically, we embrace the compositional properties of pH systems to characterize deep Hamiltonian control policies with built-in closed-loop stability guarantees, irrespective of the interconnection topology and the chosen NN parameters. Furthermore, our setup enables leveraging recent results on well-behaved neural ODEs to prevent the phenomenon of vanishing gradients by design. Numerical experiments corroborate the dependability of the proposed architecture, while matching the performance of general neural network policies.

【2】 Advancing Residual Learning towards Powerful Deep Spiking Neural Networks 标题：向强大的深度尖峰神经网络推进残差学习链接：https://arxiv.org/abs/2112.08954

作者：Yifan Hu,Yujie Wu,Lei Deng,Guoqi Li 摘要：尽管神经形态计算发展迅速，但尖峰神经网络（SNN）的容量和表示能力不足严重限制了其实际应用范围。剩余学习和捷径已被证明是训练深层神经网络的一种重要方法，但以前的工作很少评估它们对基于棘波的通信和时空动力学特征的适用性。在本文中，我们首先发现，这种疏忽导致阻碍信息流，并伴随着退化问题，在以前的剩余SNN。然后，我们提出了一种新的面向SNN的残差块MS ResNet，它能够显著扩展直接训练的SNN的深度，例如，在CIFAR-10上可以扩展到482层，在ImageNet上可以扩展到104层，而不会观察到任何轻微的退化问题。我们在基于帧和神经形态的数据集上验证了MS-ResNet104的有效性，MS-ResNet104在ImageNet上获得了76.02%的准确率，这是在直接训练SNN领域中的首次。我们还观察到，平均每个神经元只需要一个尖峰就可以对输入样本进行分类，这具有很高的能量效率。我们相信，我们强大且可扩展的模型将为SNN的进一步开发提供强大支持。摘要：Despite the rapid progress of neuromorphic computing, inadequate capacity and insufficient representation power of spiking neural networks (SNNs) severely restrict their application scope in practice. Residual learning and shortcuts have been evidenced as an important approach for training deep neural networks, but rarely did previous work assess their applicability to the characteristics of spike-based communication and spatiotemporal dynamics. In this paper, we first identify that this negligence leads to impeded information flow and accompanying degradation problem in previous residual SNNs. Then we propose a novel SNN-oriented residual block, MS-ResNet, which is able to significantly extend the depth of directly trained SNNs, e.g. up to 482 layers on CIFAR-10 and 104 layers on ImageNet, without observing any slight degradation problem. We validate the effectiveness of MS-ResNet on both frame-based and neuromorphic datasets, and MS-ResNet104 achieves a superior result of 76.02% accuracy on ImageNet, the first time in the domain of directly trained SNNs. Great energy efficiency is also observed that on average only one spike per neuron is needed to classify an input sample. We believe our powerful and scalable models will provide a strong support for further exploration of SNNs.

【3】 Responsive parallelized architecture for deploying deep learning models in production environments 标题：用于在生产环境中部署深度学习模型的响应式并行体系结构链接：https://arxiv.org/abs/2112.08933

作者：Nikhil Verma,Krishna Prasad 备注：20 Pages 摘要：招聘人员可以通过查看求职者的简历文件轻松地将求职者列入候选名单。非结构化文档CV显示候选组合和命名实体，列出详细信息。本研究的主要目的是设计并提出一个面向网络、高度响应的计算管道，该管道使用分层细化的标签注意网络系统地预测CV实体。摘要：Recruiters can easily shortlist candidates for jobs via viewing their curriculum vitae document. Unstructured document CV beholds candidates portfolio and named entities listing details. The main aim of this study is to design and propose a web oriented, highly responsive, computational pipeline that systematically predicts CV entities using hierarchically refined label attention networks.

【4】 GOSH: Task Scheduling Using Deep Surrogate Models in Fog Computing Environments 标题：GOSH：雾计算环境下基于深度代理模型的任务调度链接：https://arxiv.org/abs/2112.08916

作者：Shreshth Tuli,Giuliano Casale,Nicholas R. Jennings 备注：Accepted in IEEE Transactions on Parallel and Distributed Systems (Special Issue on PDC for AI), 2022 摘要：最近，使用代理模型的智能调度方法被提出来有效地分配异构fog环境中的易失性任务。确定性替代模型、深度神经网络（DNN）和基于梯度的优化等先进技术可以实现低能耗和响应时间。然而，确定性的替代模型，估计优化的目标值，不考虑服务质量（QoS）目标函数的分布的不确定性，这可能导致高服务水平协议（SLA）违规率。此外，DNN训练的脆弱性，阻止了此类模型达到最小能量或响应时间。为了克服这些困难，我们提出了一种新的调度器：GOSH，即使用二阶导数和异方差深度代理模型的基于梯度的优化。GOSH使用基于二阶梯度的优化方法来获得更好的QoS，并减少收敛到调度决策的迭代次数，从而降低调度时间。GOSH使用自然参数网络来近似客观分数，而不是普通的DNN。此外，置信下限优化方法允许GOSH通过基于错误的探索，在贪婪最小化平均延迟和减少不确定性之间找到最佳折衷。因此，GOSH及其基于联合仿真的扩展GOSH*能够快速适应，并比基线方法获得更好的客观分数。我们表明GOSH*比GOSH达到更好的客观分数，但它仅适用于高资源可用性设置，而GOSH适用于有限资源设置。GOSH和GOSH*的实际系统实验表明，在能耗、响应时间和SLA违反方面，与最新技术相比，分别提高了18%、27%和82%。摘要：Recently, intelligent scheduling approaches using surrogate models have been proposed to efficiently allocate volatile tasks in heterogeneous fog environments. Advances like deterministic surrogate models, deep neural networks (DNN) and gradient-based optimization allow low energy consumption and response times to be reached. However, deterministic surrogate models, which estimate objective values for optimization, do not consider the uncertainties in the distribution of the Quality of Service (QoS) objective function that can lead to high Service Level Agreement (SLA) violation rates. Moreover, the brittle nature of DNN training and prevent such models from reaching minimal energy or response times. To overcome these difficulties, we present a novel scheduler: GOSH i.e. Gradient Based Optimization using Second Order derivatives and Heteroscedastic Deep Surrogate Models. GOSH uses a second-order gradient based optimization approach to obtain better QoS and reduce the number of iterations to converge to a scheduling decision, subsequently lowering the scheduling time. Instead of a vanilla DNN, GOSH uses a Natural Parameter Network to approximate objective scores. Further, a Lower Confidence Bound optimization approach allows GOSH to find an optimal trade-off between greedy minimization of the mean latency and uncertainty reduction by employing error-based exploration. Thus, GOSH and its co-simulation based extension GOSH*, can adapt quickly and reach better objective scores than baseline methods. We show that GOSH* reaches better objective scores than GOSH, but it is suitable only for high resource availability settings, whereas GOSH is apt for limited resource settings. Real system experiments for both GOSH and GOSH* show significant improvements against the state-of-the-art in terms of energy consumption, response time and SLA violations by up to 18, 27 and 82 percent, respectively.

【5】 Characterizing and addressing the issue of oversmoothing in neural autoregressive sequence modeling 标题：神经自回归序列建模中过平滑问题的表征与解决链接：https://arxiv.org/abs/2112.08914

作者：Ilia Kulikov,Maksim Eremeev,Kyunghyun Cho 备注：Ilia Kulikov and Maksim Eremeev contributed equally 摘要：神经自回归序列模型抹黑了许多可能序列的概率，包括退化序列，如空序列或重复序列。在这项工作中，我们处理一个特定的情况下，该模型分配了一个不合理的短序列的高概率。我们定义过平滑率来量化这个问题。在确认神经机器翻译中的高过平滑度之后，我们建议在训练期间显式地最小化过平滑率。我们进行了一组实验来研究所提出的正则化对模型分布和解码性能的影响。我们使用神经机器翻译任务作为测试床，并考虑不同大小的三个不同的数据集。我们的实验揭示了三个主要发现。首先，我们可以通过调整正则化的强度来控制模型的过平滑率。第二，通过增加过平滑损失贡献，在不应该出现的位置，标记的概率和等级大幅降低。第三，建议的正则化会影响波束搜索的结果，尤其是在使用大波束时。在较低的过平滑率下，大波束的平移质量（以BLEU为单位）的退化显著减少，但与较小波束尺寸相比，这种退化仍然存在。从这些观察结果，我们得出结论，高度过光滑是神经自回归模型中过度可能的短序列退化情况背后的主要原因。摘要：Neural autoregressive sequence models smear the probability among many possible sequences including degenerate ones, such as empty or repetitive sequences. In this work, we tackle one specific case where the model assigns a high probability to unreasonably short sequences. We define the oversmoothing rate to quantify this issue. After confirming the high degree of oversmoothing in neural machine translation, we propose to explicitly minimize the oversmoothing rate during training. We conduct a set of experiments to study the effect of the proposed regularization on both model distribution and decoding performance. We use a neural machine translation task as the testbed and consider three different datasets of varying size. Our experiments reveal three major findings. First, we can control the oversmoothing rate of the model by tuning the strength of the regularization. Second, by enhancing the oversmoothing loss contribution, the probability and the rank oftoken decrease heavily at positions where it is not supposed to be. Third, the proposed regularization impacts the outcome of beam search especially when a large beam is used. The degradation of translation quality (measured in BLEU) with a large beam significantly lessens with lower oversmoothing rate, but the degradation compared to smaller beam sizes remains to exist. From these observations, we conclude that the high degree of oversmoothing is the main reason behind the degenerate case of overly probable short sequences in a neural autoregressive model.

【6】 Adapting Document-Grounded Dialog Systems to Spoken Conversations using Data Augmentation and a Noisy Channel Model 标题：使用数据增强和噪声信道模型使基于文档的对话系统适应口语对话链接：https://arxiv.org/abs/2112.08844

作者：David Thulke,Nico Daheim,Christian Dugast,Hermann Ney 备注：Accepted to the DSTC10 workshop at AAAI 2022 摘要：本文总结了我们对第十届对话系统技术挑战（DSTC10）“基于知识的面向任务的口语对话建模”第二轨道任务2的提交。与前一年的迭代类似，该任务由三个子任务组成：检测一个回合是否是知识寻求，选择相关的知识文档，最后生成扎根的响应。今年，重点在于使系统适应嘈杂的ASR成绩单。我们探索了不同的方法，使模型对这种类型的输入更加健壮，并使生成的响应适应口语对话的风格。对于后者，我们使用噪声信道模型获得最佳结果，该模型还减少了短响应和一般响应的数量。我们最好的系统在挑战的自动评估中排名第一，在人类评估中排名第三。摘要：This paper summarizes our submission to Task 2 of the second track of the 10th Dialog System Technology Challenge (DSTC10) "Knowledge-grounded Task-oriented Dialogue Modeling on Spoken Conversations". Similar to the previous year's iteration, the task consists of three subtasks: detecting whether a turn is knowledge seeking, selecting the relevant knowledge document and finally generating a grounded response. This year, the focus lies on adapting the system to noisy ASR transcripts. We explore different approaches to make the models more robust to this type of input and to adapt the generated responses to the style of spoken conversations. For the latter, we get the best results with a noisy channel model that additionally reduces the number of short and generic responses. Our best system achieved the 1st rank in the automatic and the 3rd rank in the human evaluation of the challenge.

【7】 DISTREAL: Distributed Resource-Aware Learning in Heterogeneous Systems 标题：DISTREAL：异构系统中的分布式资源感知学习链接：https://arxiv.org/abs/2112.08761

作者：Martin Rapp,Ramin Khalili,Kilian Pfeiffer,Jörg Henkel 备注：to be published in AAAI Conference on Artificial Intelligence (AAAI'22) 摘要：我们研究了在具有异构、有限和时变计算资源可用性的设备上对神经网络（NNs）进行分布式训练的问题。我们提出了一种自适应、资源感知的设备上学习机制DISTREAL，它能够以分布式方式充分有效地利用设备上的可用资源，提高收敛速度。这是通过一种退出机制实现的，该机制通过随机丢弃模型卷积层的滤波器来动态调整训练神经网络的计算复杂性。我们的主要贡献是引入了一种设计空间探索（DSE）技术，该技术可以找到关于资源需求和训练收敛速度的帕累托最优每层退出向量。应用此技术，每个设备都能够动态选择适合其可用资源的退出向量，而无需服务器的任何帮助。我们在联邦学习（FL）系统中实现了我们的解决方案，在该系统中，计算资源的可用性随设备和时间的变化而变化，并通过广泛的评估表明，我们能够在不影响最终精度的情况下显著提高最新技术的收敛速度。摘要：We study the problem of distributed training of neural networks (NNs) on devices with heterogeneous, limited, and time-varying availability of computational resources. We present an adaptive, resource-aware, on-device learning mechanism, DISTREAL, which is able to fully and efficiently utilize the available resources on devices in a distributed manner, increasing the convergence speed. This is achieved with a dropout mechanism that dynamically adjusts the computational complexity of training an NN by randomly dropping filters of convolutional layers of the model. Our main contribution is the introduction of a design space exploration (DSE) technique, which finds Pareto-optimal per-layer dropout vectors with respect to resource requirements and convergence speed of the training. Applying this technique, each device is able to dynamically select the dropout vector that fits its available resource without requiring any assistance from the server. We implement our solution in a federated learning (FL) system, where the availability of computational resources varies both between devices and over time, and show through extensive evaluation that we are able to significantly increase the convergence speed over the state of the art without compromising on the final accuracy.

【8】 Learning to Minimize Cost-to-Serve for Multi-Node Multi-Product Order Fulfilment in Electronic Commerce 标题：电子商务中多节点多产品订单执行的服务成本最小化研究链接：https://arxiv.org/abs/2112.08736

作者：Pranavi Pathakota,Kunwar Zaid,Anulekha Dhara,Hardik Meisheri,Shaun D Souza,Dheeraj Shah,Harshad Khadilkar 摘要：我们描述了一个新的决策问题的发展响应的需求，零售电子商务（电子商务）。在与物流和零售业业务合作伙伴合作时，我们发现从供应链中最合适的节点交付产品的成本（称为服务成本或CTS的数量）是一个关键挑战。电子商务供应链的大规模、高度随机性和巨大的地理分布使得此设置非常适合精心设计的数据驱动决策算法。在这项前期工作中，我们关注于在每个时间段内从任何仓库向多个客户交付任意数量的多个产品的特定子问题。我们比较了几种基线的相对性能和计算效率，包括启发式和混合整数线性规划。我们证明了基于强化学习的算法与这些策略是有竞争力的，在现实世界中具有有效扩展的潜力。摘要：We describe a novel decision-making problem developed in response to the demands of retail electronic commerce (e-commerce). While working with logistics and retail industry business collaborators, we found that the cost of delivery of products from the most opportune node in the supply chain (a quantity called the cost-to-serve or CTS) is a key challenge. The large scale, high stochasticity, and large geographical spread of e-commerce supply chains make this setting ideal for a carefully designed data-driven decision-making algorithm. In this preliminary work, we focus on the specific subproblem of delivering multiple products in arbitrary quantities from any warehouse to multiple customers in each time period. We compare the relative performance and computational efficiency of several baselines, including heuristics and mixed-integer linear programming. We show that a reinforcement learning based algorithm is competitive with these policies, with the potential of efficient scale-up in the real world.

【9】 Machine Learning-Accelerated Computational Solid Mechanics: Application to Linear Elasticity 标题：机器学习加速计算固体力学：在线弹性中的应用链接：https://arxiv.org/abs/2112.08676

作者：Rajat Arora 备注：Accepted in 1st Annual AAAI Workshop on AI to Accelerate Science and Engineering (AI2ASE). this https URL 摘要：这项工作提出了一种新的基于物理信息的深度学习超分辨率框架，用于从粗网格模拟或实验获得的低分辨率变形场重建高分辨率变形场。我们利用物理系统的控制方程和边界条件来训练模型，而不使用任何高分辨率标记数据。将该方法应用于从线弹性变形体的粗网格模拟中获得的低分辨率应力场和位移场中获得超分辨率变形场。我们证明，超分辨率场与以400倍粗网格分辨率运行的高级数值解算器的精度相匹配，同时满足控制定律。还对两种基于深度学习的超分辨率体系结构的性能进行了简要的评估研究。摘要：This work presents a novel physics-informed deep learning based super-resolution framework to reconstruct high-resolution deformation fields from low-resolution counterparts, obtained from coarse mesh simulations or experiments. We leverage the governing equations and boundary conditions of the physical system to train the model without using any high-resolution labeled data. The proposed approach is applied to obtain the super-resolved deformation fields from the low-resolution stress and displacement fields obtained by running simulations on a coarse mesh for a body undergoing linear elastic deformation. We demonstrate that the super-resolved fields match the accuracy of an advanced numerical solver running at 400 times the coarse mesh resolution, while simultaneously satisfying the governing laws. A brief evaluation study comparing the performance of two deep learning based super-resolution architectures is also presented.

【10】 Intelligent Bearing Fault Diagnosis Method Combining Mixed Input and Hybrid CNN-MLP model 标题：混合输入和混合CNN-MLP模型相结合的智能轴承故障诊断方法链接：https://arxiv.org/abs/2112.08673

作者：V. Sinitsin,O. Ibryaeva,V. Sakovskaya,V. Eremeeva 摘要：滚动轴承是工业机械中应用最广泛的轴承之一。滚动轴承状况的恶化可能导致旋转机械的全面故障。基于人工智能的方法广泛应用于滚动轴承的故障诊断。基于混合神经网络的方法已被证明能获得最佳的诊断结果。通常，原始数据由安装在机器外壳上的加速计生成。然而，每个信号的诊断效用高度依赖于相应加速计的位置。本文提出了一种新的基于CNN-MLP模型的混合诊断方法，该方法结合混合输入进行滚动轴承诊断。该方法利用安装在轴上的无线加速度传感器的加速度数据成功地检测和定位轴承缺陷。实验结果表明，该混合模型优于单独运行的CNN和MLP模型，对轴承故障的检测精度为99,6%，而CNN和MLP模型的检测精度分别为98%和81%。摘要：Rolling bearings are one of the most widely used bearings in industrial machines. Deterioration in the condition of rolling bearings can result in the total failure of rotating machinery. AI-based methods are widely applied in the diagnosis of rolling bearings. Hybrid NN-based methods have been shown to achieve the best diagnosis results. Typically, raw data is generated from accelerometers mounted on the machine housing. However, the diagnostic utility of each signal is highly dependent on the location of the corresponding accelerometer. This paper proposes a novel hybrid CNN-MLP model-based diagnostic method which combines mixed input to perform rolling bearing diagnostics. The method successfully detects and localizes bearing defects using acceleration data from a shaft-mounted wireless acceleration sensor. The experimental results show that the hybrid model is superior to the CNN and MLP models operating separately, and can deliver a high detection accuracy of 99,6% for the bearing faults compared to 98% for CNN and 81% for MLP models.

【11】 Learning to Prompt for Continual Learning 标题：学会促进持续学习链接：https://arxiv.org/abs/2112.08654

作者：Zifeng Wang,Zizhao Zhang,Chen-Yu Lee,Han Zhang,Ruoxi Sun,Xiaoqi Ren,Guolong Su,Vincent Perot,Jennifer Dy,Tomas Pfister 摘要：持续学习背后的主流范式是使模型参数适应非平稳数据分布，其中灾难性遗忘是核心挑战。典型的方法依赖于测试时的预演缓冲区或已知任务标识来检索所学知识和解决遗忘问题，而这项工作提出了一种新的持续学习范式，旨在训练更简洁的记忆系统，而不需要在测试时访问任务标识。我们的方法学习动态提示（L2P）一个预先训练的模型，以便在不同的任务转换下顺序学习任务。在我们提出的框架中，提示是可学习的小参数，保存在内存空间中。目标是优化提示以指导模型预测，并在保持模型可塑性的同时明确管理任务不变和任务特定知识。我们在具有不同挑战性的连续学习设置的流行图像分类基准下进行综合实验，其中L2P始终优于现有的最先进方法。令人惊讶的是，L2P即使没有预演缓冲区，也能与基于预演的方法取得竞争性的结果，并且直接适用于具有挑战性的任务无关的持续学习。源代码可在https://github.com/google-research/l2p. 摘要：The mainstream paradigm behind continual learning has been to adapt the model parameters to non-stationary data distributions, where catastrophic forgetting is the central challenge. Typical methods rely on a rehearsal buffer or known task identity at test time to retrieve learned knowledge and address forgetting, while this work presents a new paradigm for continual learning that aims to train a more succinct memory system without accessing task identity at test time. Our method learns to dynamically prompt (L2P) a pre-trained model to learn tasks sequentially under different task transitions. In our proposed framework, prompts are small learnable parameters, which are maintained in a memory space. The objective is to optimize prompts to instruct the model prediction and explicitly manage task-invariant and task-specific knowledge while maintaining model plasticity. We conduct comprehensive experiments under popular image classification benchmarks with different challenging continual learning settings, where L2P consistently outperforms prior state-of-the-art methods. Surprisingly, L2P achieves competitive results against rehearsal-based methods even without a rehearsal buffer and is directly applicable to challenging task-agnostic continual learning. Source code is available at https://github.com/google-research/l2p.

【12】 Learning Interpretable Models Through Multi-Objective Neural Architecture Search 标题：基于多目标神经结构搜索的可解释模型学习链接：https://arxiv.org/abs/2112.08645

作者：Zachariah Carmichael,Tim Moon,Sam Ade Jacobs 摘要：深度学习的巨大进步在许多领域都带来了前所未有的成就。虽然深度神经网络的性能是无可置疑的，但这种模型的结构设计和可解释性并不重要。通过神经体系结构搜索（NAS）实现神经网络体系结构设计自动化的研究已经开始。最近的进展通过利用分布式计算和新的优化算法使这些方法更加实用。然而，在优化体系结构以实现可解释性方面几乎没有工作。为此，我们提出了一个多目标分布式NAS框架，该框架优化了任务性能和内省。我们利用非支配排序遗传算法（NSGA-II）和可解释人工智能（XAI）技术来奖励人类能够更好理解的体系结构。该框架在多个图像分类数据集上进行了评估。我们证明，对内省能力和任务错误进行联合优化会导致在可容忍的错误范围内执行更为分散的体系结构。摘要：Monumental advances in deep learning have led to unprecedented achievements across a multitude of domains. While the performance of deep neural networks is indubitable, the architectural design and interpretability of such models are nontrivial. Research has been introduced to automate the design of neural network architectures through neural architecture search (NAS). Recent progress has made these methods more pragmatic by exploiting distributed computation and novel optimization algorithms. However, there is little work in optimizing architectures for interpretability. To this end, we propose a multi-objective distributed NAS framework that optimizes for both task performance and introspection. We leverage the non-dominated sorting genetic algorithm (NSGA-II) and explainable AI (XAI) techniques to reward architectures that can be better comprehended by humans. The framework is evaluated on several image classification datasets. We demonstrate that jointly optimizing for introspection ability and task error leads to more disentangled architectures that perform within tolerable error.

【13】 Learning To Retrieve Prompts for In-Context Learning 标题：学习检索提示以进行情景学习链接：https://arxiv.org/abs/2112.08633

作者：Ohad Rubin,Jonathan Herzig,Jonathan Berant 摘要：情境学习是自然语言理解的一种新范式，一个大型的预训练语言模型（LM）观察一个测试实例和几个训练实例作为输入，并直接解码输出，而不需要对其参数进行任何更新。然而，已经证明，性能在很大程度上取决于所选的训练示例（称为提示）。在这项工作中，我们提出了一种使用注释数据和LM检索上下文学习提示的有效方法。给定一个输入-输出对，我们估计给定输入和一个候选训练示例作为提示的输出概率，并根据该概率将训练示例标记为正或负。然后，我们从这些数据中训练一个高效的密集检索器，用于在测试时作为提示检索训练示例。我们在三个序列对序列任务中评估了我们的方法，在这些任务中，语言话语被映射到意义表征，并且发现它大大优于先前的工作和多个基线。摘要：In-context learning is a recent paradigm in natural language understanding, where a large pre-trained language model (LM) observes a test instance and a few training examples as its input, and directly decodes the output without any update to its parameters. However, performance has been shown to strongly depend on the selected training examples (termed prompt). In this work, we propose an efficient method for retrieving prompts for in-context learning using annotated data and a LM. Given an input-output pair, we estimate the probability of the output given the input and a candidate training example as the prompt, and label training examples as positive or negative based on this probability. We then train an efficient dense retriever from this data, which is used to retrieve training examples as prompts at test time. We evaluate our approach on three sequence-to-sequence tasks where language utterances are mapped to meaning representations, and find that it substantially outperforms prior work and multiple baselines across the board.

【14】 Leveraging the structure of dynamical systems for data-driven modeling 标题：利用动态系统的结构进行数据驱动建模链接：https://arxiv.org/abs/2112.08458

作者：Alessandro Bucci,Onofrio Semeraro,Alexandre Allauzen,Sergio Chibbaro,Lionel Mathelin 摘要：许多科学领域都需要对复杂系统的时间行为进行可靠的预测。然而，这种强烈的兴趣受到建模问题的阻碍：通常，描述所考虑系统物理的控制方程是不可访问的，或者，如果已知，其解可能需要与预测时间约束不兼容的计算时间。如今，以一种通用的函数形式近似手头的复杂系统，并从现有的观测结果中及时通知它，已经成为一种普遍的做法，过去几年出现的大量科学工作就说明了这一点。基于深度神经网络的许多成功例子已经可用，尽管模型的可推广性和担保保证金常常被忽视。在这里，我们考虑长短记忆神经网络，彻底研究训练集及其结构对长期预测质量的影响。利用遍历理论，我们分析足够的数据量，以先验地保证物理系统的可靠模型。我们展示了基于系统不变量和底层吸引子结构的训练集的知情设计如何显著改进结果模型，为主动学习环境下的研究开辟了途径。此外，还将说明在依赖支持内存的模型时内存初始化的非平凡影响。我们的发现为任何复杂动力系统的有效数据驱动建模所需的数据量和数据选择提供了基于证据的良好实践。摘要：The reliable prediction of the temporal behavior of complex systems is required in numerous scientific fields. This strong interest is however hindered by modeling issues: often, the governing equations describing the physics of the system under consideration are not accessible or, when known, their solution might require a computational time incompatible with the prediction time constraints. Nowadays, approximating complex systems at hand in a generic functional format and informing it ex nihilo from available observations has become a common practice, as illustrated by the enormous amount of scientific work appeared in the last years. Numerous successful examples based on deep neural networks are already available, although generalizability of the models and margins of guarantee are often overlooked. Here, we consider Long-Short Term Memory neural networks and thoroughly investigate the impact of the training set and its structure on the quality of the long-term prediction. Leveraging ergodic theory, we analyze the amount of data sufficient for a priori guaranteeing a faithful model of the physical system. We show how an informed design of the training set, based on invariants of the system and the structure of the underlying attractor, significantly improves the resulting models, opening up avenues for research within the context of active learning. Further, the non-trivial effects of the memory initializations when relying on memory-capable models will be illustrated. Our findings provide evidence-based good-practice on the amount and the choice of data required for an effective data-driven modeling of any complex dynamical system.

【15】 Climate-Invariant Machine Learning 标题：气候不变的机器学习链接：https://arxiv.org/abs/2112.08440

作者：Tom Beucler,Michael Pritchard,Janni Yuval,Ankitesh Gupta,Liran Peng,Stephan Rasp,Fiaz Ahmed,Paul A. O'Gorman,J. David Neelin,Nicholas J. Lutsko,Pierre Gentine 备注：12+18 pages, 8+12 figures, 2+2 tables in the main text + supplementary information. Submitted to PNAS on December 14th, 2021 摘要：数据驱动算法，特别是神经网络，在高分辨率模拟数据上训练时，可以模拟粗分辨率气候模型中未分辨过程的影响；然而，当在未经训练的条件下进行评估时，它们往往会产生较大的泛化错误。在这里，我们建议物理地重新调整机器学习算法的输入和输出，以帮助它们推广到看不见的气候。应用于三种不同气候模型中的亚网格尺度热力学离线参数化，我们表明，重标度或“气候不变”神经网络能够在比训练气候温暖4K和8K的试验气候中进行准确预测。此外，“气候不变”神经网络促进了Aquaplanet和类地球模拟之间的泛化。通过可视化和归因方法，我们表明，与标准的机器学习模型相比，“气候不变”算法学习了风暴尺度对流、辐射及其天气热力环境之间更多的局部和鲁棒关系。总的来说，这些结果表明，明确地将物理知识纳入地球系统过程的数据驱动模型中，可以提高它们的一致性和跨气候机制的推广能力。摘要：Data-driven algorithms, in particular neural networks, can emulate the effects of unresolved processes in coarse-resolution climate models when trained on high-resolution simulation data; however, they often make large generalization errors when evaluated in conditions they were not trained on. Here, we propose to physically rescale the inputs and outputs of machine learning algorithms to help them generalize to unseen climates. Applied to offline parameterizations of subgrid-scale thermodynamics in three distinct climate models, we show that rescaled or "climate-invariant" neural networks make accurate predictions in test climates that are 4K and 8K warmer than their training climates. Additionally, "climate-invariant" neural nets facilitate generalization between Aquaplanet and Earth-like simulations. Through visualization and attribution methods, we show that compared to standard machine learning models, "climate-invariant" algorithms learn more local and robust relations between storm-scale convection, radiation, and their synoptic thermodynamic environment. Overall, these results suggest that explicitly incorporating physical knowledge into data-driven models of Earth system processes can improve their consistency and ability to generalize across climate regimes.

【16】 torch.fx: Practical Program Capture and Transformation for Deep Learning in Python 标题：Torch.fx：Python中用于深度学习的实用程序捕获和转换链接：https://arxiv.org/abs/2112.08429

作者：James K. Reed,Zachary DeVito,Horace He,Ansley Ussery,Jason Ansel 备注：14 pages, 8 figures, Submitted to MLSys 2022 摘要：现代深度学习框架提供了嵌入Python的命令式、渴望执行的编程接口，以提供高效的开发体验。然而，深度学习实践者有时需要捕获和转换程序结构，以实现性能优化、可视化、分析和硬件集成。我们研究了用于深度学习的程序捕获和转换的不同设计。通过设计典型的深度学习用例而不是长尾用例，可以创建一个更简单的程序捕获和转换框架。我们在torch中应用了这一原则。fx是PyTorch的一个程序捕获和转换库，完全用Python编写，并由ML从业者针对高开发效率进行了优化。我们提出的案例研究显示了火炬是如何发挥作用的。fx支持PyTorch生态系统中以前无法访问的工作流。摘要：Modern deep learning frameworks provide imperative, eager execution programming interfaces embedded in Python to provide a productive development experience. However, deep learning practitioners sometimes need to capture and transform program structure for performance optimization, visualization, analysis, and hardware integration. We study the different designs for program capture and transformation used in deep learning. By designing for typical deep learning use cases rather than long tail ones, it is possible to create a simpler framework for program capture and transformation. We apply this principle in torch.fx, a program capture and transformation library for PyTorch written entirely in Python and optimized for high developer productivity by ML practitioners. We present case studies showing how torch.fx enables workflows previously inaccessible in the PyTorch ecosystem.

【17】 Neural Network-based Power Flow Model 标题：基于神经网络的潮流模型链接：https://arxiv.org/abs/2112.08418

作者：Thuan Pham,Xingpeng Li 摘要：潮流分析用于评估电力系统网络中的潮流。潮流计算用于确定系统的稳态变量，如各母线的电压幅值/相位角以及各支路上的有功/无功潮流。直流潮流模型是一种广泛应用于电力行业的线性潮流模型。虽然它是快速和稳健的，但它可能会导致一些关键输电线路的线流结果不准确。这个缺点可以通过利用历史网格配置文件的数据驱动方法部分解决。在本文中，神经网络（NN）模型的训练，以预测电力系统的历史数据的潮流结果。虽然训练过程可能需要时间，但一旦训练完成，估计线路流量的速度非常快。对所提出的基于神经网络的潮流模型与传统的直流潮流模型进行了综合性能分析。结果表明，与直流潮流模型相比，基于神经网络的潮流模型能够快速、准确地求解。摘要：Power flow analysis is used to evaluate the flow of electricity in the power system network. Power flow calculation is used to determine the steady-state variables of the system, such as the voltage magnitude /phase angle of each bus and the active/reactive power flow on each branch. The DC power flow model is a popular linear power flow model that is widely used in the power industry. Although it is fast and robust, it may lead to inaccurate line flow results for some critical transmission lines. This drawback can be partially addressed by data-driven methods that take advantage of historical grid profiles. In this paper, a neural network (NN) model is trained to predict power flow results using historical power system data. Although the training process may take time, once trained, it is very fast to estimate line flows. A comprehensive performance analysis between the proposed NN-based power flow model and the traditional DC power flow model is conducted. It can be concluded that the proposed NN-based power flow model can find solutions quickly and more accurately than DC power flow model.

【18】 Machine Learning Kreuzer--Skarke Calabi--Yau Threefolds 标题：机器学习Kreuzer--Skarke Calabi--Yau Threefolds 链接：https://arxiv.org/abs/2112.09117

作者：Per Berglund,Ben Campbell,Vishnu Jejjala 备注：16 pages, 4 figures 摘要：利用全连通前馈神经网络，我们研究了一类Calabi--Yau流形的拓扑不变量，这些流形构造为与Kreuzer--Skarke数据库中的自反多面体相关的复曲面变体中的超曲面。特别是，我们发现了欧拉数的一个简单表达式的存在性，该表达式可以通过从多面体及其对偶体中提取的有限数据来学习。摘要：Using a fully connected feedforward neural network we study topological invariants of a class of Calabi--Yau manifolds constructed as hypersurfaces in toric varieties associated with reflexive polytopes from the Kreuzer--Skarke database. In particular, we find the existence of a simple expression for the Euler number that can be learned in terms of limited data extracted from the polytope and its dual.

【19】 The Dual PC Algorithm for Structure Learning 标题：用于结构学习的双PC算法链接：https://arxiv.org/abs/2112.09036

作者：Enrico Giudice,Jack Kuipers,Giusi Moffa 摘要：虽然从观测数据中学习贝叶斯网络的图形结构是描述和帮助理解复杂应用中数据生成过程的关键，但由于其计算复杂性，该任务带来了相当大的挑战。代表贝叶斯网络模型的有向无环图（DAG）通常无法从观测数据中识别，存在多种方法来估计其等价类。在某些假设下，流行的PC算法可以通过测试条件独立性（CI），从边缘独立关系开始，逐步扩展条件集，一致地恢复正确的等价类。在这里，我们提出了双PC算法，这是一种利用协方差和精度矩阵之间的逆关系在PC算法中执行CI测试的新方案。值得注意的是，精度矩阵的元素与高斯数据的偏相关一致。然后，我们的算法利用协方差矩阵和精度矩阵上的块矩阵求逆，同时对互补（或对偶）条件集的偏相关进行测试。因此，双PC算法的多重CI测试首先考虑边缘和全阶CI关系，然后逐步转移到中心阶CI关系。仿真研究表明，双PC算法在运行时间和恢复底层网络结构方面均优于经典PC算法。摘要：While learning the graphical structure of Bayesian networks from observational data is key to describing and helping understand data generating processes in complex applications, the task poses considerable challenges due to its computational complexity. The directed acyclic graph (DAG) representing a Bayesian network model is generally not identifiable from observational data, and a variety of methods exist to estimate its equivalence class instead. Under certain assumptions, the popular PC algorithm can consistently recover the correct equivalence class by testing for conditional independence (CI), starting from marginal independence relationships and progressively expanding the conditioning set. Here, we propose the dual PC algorithm, a novel scheme to carry out the CI tests within the PC algorithm by leveraging the inverse relationship between covariance and precision matrices. Notably, the elements of the precision matrix coincide with partial correlations for Gaussian data. Our algorithm then exploits block matrix inversions on the covariance and precision matrices to simultaneously perform tests on partial correlations of complementary (or dual) conditioning sets. The multiple CI tests of the dual PC algorithm, therefore, proceed by first considering marginal and full-order CI relationships and progressively moving to central-order ones. Simulation studies indicate that the dual PC algorithm outperforms the classical PC algorithm both in terms of run time and in recovering the underlying network structure.

【20】 Interference Suppression Using Deep Learning: Current Approaches and Open Challenges 标题：利用深度学习抑制干扰：当前方法和面临的挑战链接：https://arxiv.org/abs/2112.08988

作者：Taiwo Oyedare,Vijay K Shah,Daniel J Jakubisin,Jeff H Reed 备注：26 pages, 10 figures, journal article 摘要：鉴于无线频谱的有限性以及最近无线通信技术的突破对频谱使用的需求不断增加，干扰问题仍然存在。尽管最近在解决干扰问题方面取得了进展，但干扰仍然对频谱的有效利用提出了困难的挑战。这部分是由于Wi-Fi、长期演进（LTE）无许可（LTE-U）、LTE许可辅助接入（LAA）、5G NR和其他机会主义频谱接入解决方案使用无许可和管理共享频段的增加。因此，对抗干扰能力强的高效频谱使用方案的需求从未如此重要。在过去，大多数干扰解决方案都通过使用避免技术以及非人工智能缓解方法（例如，自适应滤波器）来解决该问题。非人工智能技术的主要缺点是，在提取或利用信号特征（如干扰信号的循环平稳性、带宽和调制）方面需要领域专家。最近，研究人员成功地探索了AI/ML支持的物理（PHY）层技术，特别是深度学习，它可以减少或补偿干扰信号，而不是简单地避免干扰信号。基于ML的方法的基本思想是从数据中学习干扰或干扰特征，从而避免在抑制干扰时需要领域专家。在这篇文章中，我们回顾了一系列使用深度学习来抑制干扰的技术。我们为许多不同类型的干扰抑制深度学习技术提供了比较和指导。此外，我们强调了在干扰抑制中成功采用深度学习的挑战和潜在的未来研究方向。摘要：In light of the finite nature of the wireless spectrum and the increasing demand for spectrum use arising from recent technological breakthroughs in wireless communication, the problem of interference continues to persist. Despite recent advancements in resolving interference issues, interference still presents a difficult challenge to effective usage of the spectrum. This is partly due to the rise in the use of license-free and managed shared bands for Wi-Fi, long term evolution (LTE) unlicensed (LTE-U), LTE licensed assisted access (LAA), 5G NR, and other opportunistic spectrum access solutions. As a result of this, the need for efficient spectrum usage schemes that are robust against interference has never been more important. In the past, most solutions to interference have addressed the problem by using avoidance techniques as well as non-AI mitigation approaches (for example, adaptive filters). The key downside to non-AI techniques is the need for domain expertise in the extraction or exploitation of signal features such as cyclostationarity, bandwidth and modulation of the interfering signals. More recently, researchers have successfully explored AI/ML enabled physical (PHY) layer techniques, especially deep learning which reduces or compensates for the interfering signal instead of simply avoiding it. The underlying idea of ML based approaches is to learn the interference or the interference characteristics from the data, thereby sidelining the need for domain expertise in suppressing the interference. In this paper, we review a wide range of techniques that have used deep learning to suppress interference. We provide comparison and guidelines for many different types of deep learning techniques in interference suppression. In addition, we highlight challenges and potential future research directions for the successful adoption of deep learning in interference suppression.

【21】 Quantum Model Learning Agent: characterisation of quantum systems through machine learning 标题：量子模型学习Agent：通过机器学习来表征量子系统链接：https://arxiv.org/abs/2112.08409

作者：Brian Flynn,Antonio Andreas Gentile,Nathan Wiebe,Raffaele Santagati,Anthony Laing 备注：29 pages, 7 figures 摘要：真实量子系统的精确模型对于研究它们的行为很重要，但很难从经验中提取。在这里，我们报告了一种算法——量子模型学习代理（QMLA）——对目标系统的哈密顿描述进行反向工程。我们在大量模拟实验中测试了QMLA的性能，展示了设计候选哈密顿模型的几种机制，同时对控制所研究系统的物理相互作用的性质提出了许多假设。在大多数情况下，当提供有限的先验信息和控制实验设置时，QMLA可以识别真实模型。我们的协议可以并行探索伊辛、海森堡和哈伯德模型族，可靠地确定最能描述系统动力学的模型族。我们通过引入一个遗传算法来建立新的假设模型，来演示QMLA在大模型空间上的操作。其特征传播到下一代的模型的选择基于一个目标函数，该目标函数受Elo评级方案的启发，通常用于对国际象棋和足球等游戏中的竞争对手进行评级。在所有情况下，我们的协议发现，与真实模型相比，模型表现出$F_1$-分数$geq 0.88$，并且在72%的情况下准确识别真实模型，同时探索超过$250000$的潜在模型空间。通过测试目标系统中实际发生的相互作用，QMLA是探索基础物理和描述和校准量子器件的可行工具。摘要：Accurate models of real quantum systems are important for investigating their behaviour, yet are difficult to distill empirically. Here, we report an algorithm -- the Quantum Model Learning Agent (QMLA) -- to reverse engineer Hamiltonian descriptions of a target system. We test the performance of QMLA on a number of simulated experiments, demonstrating several mechanisms for the design of candidate Hamiltonian models and simultaneously entertaining numerous hypotheses about the nature of the physical interactions governing the system under study. QMLA is shown to identify the true model in the majority of instances, when provided with limited a priori information, and control of the experimental setup. Our protocol can explore Ising, Heisenberg and Hubbard families of models in parallel, reliably identifying the family which best describes the system dynamics. We demonstrate QMLA operating on large model spaces by incorporating a genetic algorithm to formulate new hypothetical models. The selection of models whose features propagate to the next generation is based upon an objective function inspired by the Elo rating scheme, typically used to rate competitors in games such as chess and football. In all instances, our protocol finds models that exhibit $F_1$-score $geq 0.88$ when compared with the true model, and it precisely identifies the true model in 72% of cases, whilst exploring a space of over $250,000$ potential models. By testing which interactions actually occur in the target system, QMLA is a viable tool for both the exploration of fundamental physics and the characterisation and calibration of quantum devices.

【22】 Breeding realistic D-brane models 标题：培育逼真的D-膜模型链接：https://arxiv.org/abs/2112.08391

作者：Gregory J. Loges,Gary Shiu 备注：19 pages + appendices, 9 figures 摘要：交叉膜提供了一种有用的机制，可以从弦理论构建具有各种理想特性的粒子物理模型。这类模型的前景可能是巨大的，向现象学上最有趣的区域导航可能具有挑战性。机器学习技术可以用来有效地构造大量一致的和现象学上需要的模型。在这项工作中，我们用遗传算法来描述寻找一致的交叉D膜模型的问题，遗传算法模拟自然选择，使种群集体进化到最优解。对于具有相交D6膜的四维${cal N}=1$超对称IIA型定向褶皱，我们证明$mathcal{O}（10^6）$唯一、完全一致的模型可以很容易地构造，并且，通过明智地选择搜索环境和超参数，$mathcal{O}（30\%）$找到的模型包含所需的标准模型量规组系数。有了一个相当大的样本，我们就可以得出一些交叉膜模型的初步景观统计数据，包括有无标准模型规范因子的限制。摘要：Intersecting branes provide a useful mechanism to construct particle physics models from string theory with a wide variety of desirable characteristics. The landscape of such models can be enormous, and navigating towards regions which are most phenomenologically interesting is potentially challenging. Machine learning techniques can be used to efficiently construct large numbers of consistent and phenomenologically desirable models. In this work we phrase the problem of finding consistent intersecting D-brane models in terms of genetic algorithms, which mimic natural selection to evolve a population collectively towards optimal solutions. For a four-dimensional ${cal N}=1$ supersymmetric type IIA orientifold with intersecting D6-branes, we demonstrate that $mathcal{O}(10^6)$ unique, fully consistent models can be easily constructed, and, by a judicious choice of search environment and hyper-parameters, $mathcal{O}(30\%)$ of the found models contain the desired Standard Model gauge group factor. Having a sizable sample allows us to draw some preliminary landscape statistics of intersecting brane models both with and without the restriction of having the Standard Model gauge factor.

其他(18篇)

【1】 IS-COUNT: Large-scale Object Counting from Satellite Images with Covariate-based Importance Sampling 标题：IS-Count：基于协变量重要性采样的卫星图像大尺度目标计数链接：https://arxiv.org/abs/2112.09126

作者：Chenlin Meng,Enci Liu,Willie Neiswanger,Jiaming Song,Marshall Burke,David Lobell,Stefano Ermon 备注：AAAI 2022 摘要：在许多环境和社会经济监测应用中，高分辨率卫星图像中的目标检测正在成为地面调查数据收集的可扩展替代方案。然而，由于购买图像和计算的成本很高，在大型地理区域执行目标检测的成本仍然高得令人望而却步。受传统调查数据收集策略的启发，我们提出了一种通过抽样估计大型地理区域的对象计数统计数据的方法。在给定成本预算的情况下，我们的方法通过从可学习的提案分布中抽样来选择少量具有代表性的领域。与穷举方法相比，使用重要性抽样，我们能够在仅处理一小部分图像后准确估计对象计数。我们的经验表明，所提出的框架在估算美国和非洲的建筑数量、肯尼亚的汽车数量、孟加拉国的砖窑数量和美国的游泳池数量方面取得了很好的效果，而与穷举法相比，只需要0.01%的卫星图像。摘要：Object detection in high-resolution satellite imagery is emerging as a scalable alternative to on-the-ground survey data collection in many environmental and socioeconomic monitoring applications. However, performing object detection over large geographies can still be prohibitively expensive due to the high cost of purchasing imagery and compute. Inspired by traditional survey data collection strategies, we propose an approach to estimate object count statistics over large geographies through sampling. Given a cost budget, our method selects a small number of representative areas by sampling from a learnable proposal distribution. Using importance sampling, we are able to accurately estimate object counts after processing only a small fraction of the images compared to an exhaustive approach. We show empirically that the proposed framework achieves strong performance on estimating the number of buildings in the United States and Africa, cars in Kenya, brick kilns in Bangladesh, and swimming pools in the U.S., while requiring as few as 0.01% of satellite images compared to an exhaustive approach.

【2】 RegionCLIP: Region-based Language-Image Pretraining 标题：RegionCLIP：基于区域的语言图像预训练链接：https://arxiv.org/abs/2112.09106

作者：Yiwu Zhong,Jianwei Yang,Pengchuan Zhang,Chunyuan Li,Noel Codella,Liunian Harold Li,Luowei Zhou,Xiyang Dai,Lu Yuan,Yin Li,Jianfeng Gao 备注：Technical report 摘要：使用图像-文本对的对比语言图像预训练（CLIP）在Zero-Shot和迁移学习环境下的图像分类方面都取得了令人印象深刻的结果。然而，我们发现，直接应用此类模型识别图像区域进行目标检测会导致性能低下，因为域转移：剪辑被训练为将图像作为一个整体与文本描述相匹配，而没有捕获图像区域和文本跨度之间的细粒度对齐。为了缓解这个问题，我们提出了一种称为RegionCLIP的新方法，该方法显著扩展了CLIP以学习区域级视觉表示，从而实现图像区域和文本概念之间的细粒度对齐。我们的方法利用剪辑模型将图像区域与模板标题进行匹配，然后对模型进行预训练，以便在特征空间中对齐这些区域文本对。当将我们的预训练模型转换为开放词汇表对象检测任务时，我们的方法在COCO和LVIS数据集上的新类别分别显著优于最新的3.8 AP50和2.2 AP。此外，学习到的区域表示支持Zero-Shot推断用于目标检测，在COCO和LVIS数据集上都显示了有希望的结果。我们的代码可在https://github.com/microsoft/RegionCLIP. 摘要：Contrastive language-image pretraining (CLIP) using image-text pairs has achieved impressive results on image classification in both zero-shot and transfer learning settings. However, we show that directly applying such models to recognize image regions for object detection leads to poor performance due to a domain shift: CLIP was trained to match an image as a whole to a text description, without capturing the fine-grained alignment between image regions and text spans. To mitigate this issue, we propose a new method called RegionCLIP that significantly extends CLIP to learn region-level visual representations, thus enabling fine-grained alignment between image regions and textual concepts. Our method leverages a CLIP model to match image regions with template captions and then pretrains our model to align these region-text pairs in the feature space. When transferring our pretrained model to the open-vocabulary object detection tasks, our method significantly outperforms the state of the art by 3.8 AP50 and 2.2 AP for novel categories on COCO and LVIS datasets, respectively. Moreoever, the learned region representations support zero-shot inference for object detection, showing promising results on both COCO and LVIS datasets. Our code is available at https://github.com/microsoft/RegionCLIP.

【3】 Solving Inverse Problems with NerfGANs 标题：用神经网络求解反问题链接：https://arxiv.org/abs/2112.09061

作者：Giannis Daras,Wen-Sheng Chu,Abhishek Kumar,Dmitry Lagun,Alexandros G. Dimakis 备注：16 pages, 18 figures 摘要：我们介绍了一种新的框架，用于解决反问题使用NeRF风格的生成模型。我们感兴趣的是给定单个二维图像和已知摄像机参数的三维场景重建问题。我们表明，天真地优化潜在空间会导致伪影和糟糕的新视图渲染。我们将此问题归因于三维几何体中清晰的体积障碍物，并在新视图的渲染中变得可见。我们提出了一种新的辐射场正则化方法，以获得更好的三维曲面，并在单视图观测的情况下改进了新视图。我们的方法自然地扩展到一般的反问题，包括仅部分观察单个视图的修复。我们通过实验评估了我们的方法，在广泛的任务中实现了视觉改善和性能提升。与以前的先进技术相比，我们的方法实现了30-40\%$MSE减少和15-25\%$LPIPS损失减少。摘要：We introduce a novel framework for solving inverse problems using NeRF-style generative models. We are interested in the problem of 3-D scene reconstruction given a single 2-D image and known camera parameters. We show that naively optimizing the latent space leads to artifacts and poor novel view rendering. We attribute this problem to volume obstructions that are clear in the 3-D geometry and become visible in the renderings of novel views. We propose a novel radiance field regularization method to obtain better 3-D surfaces and improved novel views given single view observations. Our method naturally extends to general inverse problems including inpainting where one observes only partially a single view. We experimentally evaluate our method, achieving visual improvements and performance boosts over the baselines in a wide range of tasks. Our method achieves $30-40\%$ MSE reduction and $15-25\%$ reduction in LPIPS loss compared to the previous state of the art.

【4】 Towards Robust Real-time Audio-Visual Speech Enhancement 标题：面向鲁棒实时视听语音增强的研究链接：https://arxiv.org/abs/2112.09060

作者：Mandar Gogate,Kia Dashtipour,Amir Hussain 摘要：人类大脑在上下文中利用异质的感觉信息来有效地执行包括视觉和听觉在内的认知任务。例如，在鸡尾酒会的情况下，人类的听觉皮层上下文整合视听（AV）线索，以便更好地感知语音。最近的研究表明，与纯音频语音增强（SE）模型相比，AV语音增强（SE）模型可以显著提高极低信噪比（SNR）环境下的语音质量和可懂度。然而，尽管在AV SE领域进行了大量研究，但开发低延迟的实时处理模型仍然是一项艰巨的技术挑战。在本文中，我们提出了一种新的低延迟非特定人AVSE框架，该框架可以推广到一系列视觉和声学噪声。特别地，提出了一种生成性对抗网络（GAN）来解决AV-SE中视觉缺陷的实际问题。此外，我们提出了一种基于深度神经网络的实时AV SE模型，该模型考虑了来自GAN的干净视觉语音输出，以提供更鲁棒的SE。使用客观的语音质量和可懂度指标以及主观列表测试，在合成和真实的有噪声AV语料库上对所提出的框架进行了评估。对比仿真结果表明，我们的实时AV SE框架优于最先进的SE方法，包括最新的基于DNN的SE模型。摘要：The human brain contextually exploits heterogeneous sensory information to efficiently perform cognitive tasks including vision and hearing. For example, during the cocktail party situation, the human auditory cortex contextually integrates audio-visual (AV) cues in order to better perceive speech. Recent studies have shown that AV speech enhancement (SE) models can significantly improve speech quality and intelligibility in very low signal to noise ratio (SNR) environments as compared to audio-only SE models. However, despite significant research in the area of AV SE, development of real-time processing models with low latency remains a formidable technical challenge. In this paper, we present a novel framework for low latency speaker-independent AV SE that can generalise on a range of visual and acoustic noises. In particular, a generative adversarial networks (GAN) is proposed to address the practical issue of visual imperfections in AV SE. In addition, we propose a deep neural network based real-time AV SE model that takes into account the cleaned visual speech output from GAN to deliver more robust SE. The proposed framework is evaluated on synthetic and real noisy AV corpora using objective speech quality and intelligibility metrics and subjective listing tests. Comparative simulation results show that our real time AV SE framework outperforms state-of-the-art SE approaches, including recent DNN based SE models.

【5】 Neural Style Transfer and Unpaired Image-to-Image Translation to deal with the Domain Shift Problem on Spheroid Segmentation 标题：基于神经样式转换和不成对图像到图像转换的椭球体分割中的域漂移问题链接：https://arxiv.org/abs/2112.09043

作者：Manuel García-Domínguez,César Domínguez,Jónathan Heras,Eloy Mata,Vico Pascual 摘要：背景和目标。域转移是机器学习模型的一个推广问题，当训练集的数据分布与模型部署时遇到的数据分布不同时，就会出现这种问题。由于实验条件、设备和捕获设置的变化，这在生物医学图像分割中很常见。在这项工作中，我们通过研究肿瘤球体分割背景下的神经风格转换算法和未配对图像到图像的转换方法来应对这一挑战。方法。我们已经用4种深度学习分割模型说明了球体分割中的域转移问题，当使用训练分布后的图像进行测试时，这些模型的IoU超过97%，但当应用于在不同条件下捕获的图像时，其性能下降到84%。为了解决这个问题，我们探索了3种风格转换算法（NST、深度图像类比和STROTSS）和6种未配对图像到图像转换算法（CycleGAN、DualGAN、ForkGAN、GANILLA、CUT和FastCUT）。这些算法已集成到一个高级API中，该API有助于将它们应用到发生域转移问题的其他上下文中。后果通过使用样式转换和图像到图像的转换算法，我们将这4种分割模型应用于在不同条件下捕获的图像，大大提高了性能。特别是，有2种样式转换算法（NST和深度图像模拟）和1种未配对图像到图像转换算法（CycleGAN），可在0.24到76.07的范围内改进模型的IoU。因此，达到与使用模型获得的性能相似的性能将应用于遵循训练分布的图像。摘要：Background and objectives. Domain shift is a generalisation problem of machine learning models that occurs when the data distribution of the training set is different to the data distribution encountered by the model when it is deployed. This is common in the context of biomedical image segmentation due to the variance of experimental conditions, equipment, and capturing settings. In this work, we address this challenge by studying both neural style transfer algorithms and unpaired image-to-image translation methods in the context of the segmentation of tumour spheroids. Methods. We have illustrated the domain shift problem in the context of spheroid segmentation with 4 deep learning segmentation models that achieved an IoU over 97% when tested with images following the training distribution, but whose performance decreased up to an 84\% when applied to images captured under different conditions. In order to deal with this problem, we have explored 3 style transfer algorithms (NST, deep image analogy, and STROTSS), and 6 unpaired image-to-image translations algorithms (CycleGAN, DualGAN, ForkGAN, GANILLA, CUT, and FastCUT). These algorithms have been integrated into a high-level API that facilitates their application to other contexts where the domain-shift problem occurs. Results. We have considerably improved the performance of the 4 segmentation models when applied to images captured under different conditions by using both style transfer and image-to-image translation algorithms. In particular, there are 2 style transfer algorithms (NST and deep image analogy) and 1 unpaired image-to-image translations algorithm (CycleGAN) that improve the IoU of the models in a range from 0.24 to 76.07. Therefore, reaching a similar performance to the one obtained with the models are applied to images following the training distribution.

【6】 Challenges and Solutions to Build a Data Pipeline to Identify Anomalies in Enterprise System Performance 标题：构建数据管道以识别企业系统性能异常的挑战和解决方案链接：https://arxiv.org/abs/2112.08940

作者：Xiaobo Huang,Amitabha Banerjee,Chien-Chia Chen,Chengzhi Huang,Tzu Yi Chuang,Abhishek Srivastava,Razvan Cheveresan 备注：None 摘要：我们将讨论VMware如何解决以下难题，以利用数据操作我们基于ML的异常检测系统，从而检测我们的软件定义数据中心（SDDC）企业部署中的性能问题：（i）由于严重依赖不可缩放的人工注释器，标签稀缺和标签偏差，以及（ii）由于不断变化的工作负载模式、软件堆栈和底层硬件而导致的数据漂移。我们的异常检测系统已在生产中部署多年，并已成功检测到许多重大性能问题。我们证明，通过解决这些数据挑战，我们不仅将性能异常检测模型的准确性提高了30%，而且还确保了模型性能不会随时间而降低。摘要：We discuss how VMware is solving the following challenges to harness data to operate our ML-based anomaly detection system to detect performance issues in our Software Defined Data Center (SDDC) enterprise deployments: (i) label scarcity and label bias due to heavy dependency on unscalable human annotators, and (ii) data drifts due to ever-changing workload patterns, software stack and underlying hardware. Our anomaly detection system has been deployed in production for many years and has successfully detected numerous major performance issues. We demonstrate that by addressing these data challenges, we not only improve the accuracy of our performance anomaly detection model by 30%, but also ensure that the model performance to never degrade over time.

【7】 Intelli-Paint: Towards Developing Human-like Painting Agents 标题：INTILI-PAINT：发展仿人涂饰剂链接：https://arxiv.org/abs/2112.08930

作者：Jaskirat Singh,Cameron Smith,Jose Echevarria,Liang Zheng 摘要：生成设计良好的艺术品通常非常耗时，并且假定人类画家具有高度的熟练程度。为了促进人类的绘画过程，已经在教机器如何“像人类一样绘画”方面进行了大量的研究，然后使用经过训练的代理作为人类用户的绘画辅助工具。然而，当前这方面的研究通常依赖于基于网格的渐进式分割策略，其中代理将整个图像分割为连续的更精细网格，然后并行绘制每个网格。这不可避免地导致人工绘画序列，人类用户不容易理解。为了解决这个问题，我们提出了一种新的绘画方法，它可以学习生成输出画布，同时展示更人性化的绘画风格。建议的绘制管道Intelli Paint由1）渐进分层策略组成，该策略允许代理首先绘制自然背景场景表示，然后以渐进方式添加每个前景对象。2）我们还介绍了一种新的顺序笔画引导策略，它可以帮助绘画代理以语义感知的方式在不同的图像区域之间转移注意力。3）最后，我们提出了一种笔画规则化策略，该策略允许所需笔画总数减少约60-80%，而生成画布的质量没有任何明显差异。通过定量和定性结果，我们表明，生成的代理不仅提高了输出画布生成的效率，而且展示了更自然的绘画风格，这将更好地帮助人类用户通过数字艺术品表达他们的想法。摘要：The generation of well-designed artwork is often quite time-consuming and assumes a high degree of proficiency on part of the human painter. In order to facilitate the human painting process, substantial research efforts have been made on teaching machines how to "paint like a human", and then using the trained agent as a painting assistant tool for human users. However, current research in this direction is often reliant on a progressive grid-based division strategy wherein the agent divides the overall image into successively finer grids, and then proceeds to paint each of them in parallel. This inevitably leads to artificial painting sequences which are not easily intelligible to human users. To address this, we propose a novel painting approach which learns to generate output canvases while exhibiting a more human-like painting style. The proposed painting pipeline Intelli-Paint consists of 1) a progressive layering strategy which allows the agent to first paint a natural background scene representation before adding in each of the foreground objects in a progressive fashion. 2) We also introduce a novel sequential brushstroke guidance strategy which helps the painting agent to shift its attention between different image regions in a semantic-aware manner. 3) Finally, we propose a brushstroke regularization strategy which allows for ~60-80% reduction in the total number of required brushstrokes without any perceivable differences in the quality of the generated canvases. Through both quantitative and qualitative results, we show that the resulting agents not only show enhanced efficiency in output canvas generation but also exhibit a more natural-looking painting style which would better assist human users express their ideas through digital artwork.

【8】 Ditch the Gold Standard: Re-evaluating Conversational Question Answering 标题：抛弃黄金标准：重新评估会话问答链接：https://arxiv.org/abs/2112.08812

作者：Huihan Li,Tianyu Gao,Manan Goenka,Danqi Chen 摘要：会话问答（CQA）系统旨在为用户在寻求信息的对话中提供自然语言的答案。现有的CQA基准使用会话历史中提供的基本事实答案，将模型与预先收集的人类对话进行比较。目前尚不清楚我们是否可以依靠这种静态评估来开发模型，以及当前的系统是否能够很好地推广到现实世界的人机对话。在这项工作中，我们对最先进的CQA系统进行了第一次大规模的人类评估，人类评估人员与模型对话并判断其答案的正确性。我们发现，人机对话的分布与人机对话的分布有很大的不同，并且在模型排名方面，人机对话和黄金历史评估之间存在分歧。我们进一步研究了如何改进自动评估，并提出了一种基于预测历史的问题重写机制，该机制能更好地与人类的判断相关联。最后，我们讨论了各种建模策略的影响以及未来更好的会话问答系统的发展方向。摘要：Conversational question answering (CQA) systems aim to provide natural-language answers to users in information-seeking conversations. Existing CQA benchmarks compare models with pre-collected human-human conversations, using ground-truth answers provided in conversational history. It remains unclear whether we can rely on this static evaluation for model development and whether current systems can well generalize to real-world human-machine conversations. In this work, we conduct the first large-scale human evaluation of state-of-the-art CQA systems, where human evaluators converse with models and judge the correctness of their answers. We find that the distribution of human-machine conversations differs drastically from that of human-human conversations, and there is a disagreement between human and gold-history evaluation in terms of model ranking. We further investigate how to improve automatic evaluations, and propose a question rewriting mechanism based on predicted history, which better correlates with human judgments. Finally, we discuss the impact of various modeling strategies and future directions towards better conversational question answering systems.

【9】 Saliency Grafting: Innocuous Attribution-Guided Mixup with Calibrated Label Mixing 标题：显著嫁接：无伤大雅的归因导向混合和校准标签混合链接：https://arxiv.org/abs/2112.08796

作者：Joonhyung Park,June Yong Yang,Jinwoo Shin,Sung Ju Hwang,Eunho Yang 备注：12 pages; Accepted to AAAI2022 摘要：混合方案建议混合一对样本来创建一个增强的训练样本，并且最近为了提高神经网络的可推广性而受到了相当大的关注。混搭的一个简单且广泛使用的扩展是与类似区域退出的方法相结合：从一个样本中移除随机补丁，并用另一个样本中的特征替换它。尽管这些方法简单有效，但由于其随机性，容易产生有害样本。为了解决这个问题，最近提出了“最大显著性”策略：它们只选择信息量最大的特征来防止这种现象。然而，他们现在缺乏样本多样化，因为他们总是决定性地选择具有最大显著性的区域，将偏差注入到增强的数据中。在本文中，我们提出了一种新颖而简单的混音变体，它抓住了这两个世界的优点。我们的想法是双重的。通过对特征进行随机采样并将其“嫁接”到另一个样本上，我们的方法有效地生成多样但有意义的样本。其第二个要素是通过以显著性校准方式混合标签来生成嫁接样本的标签，从而纠正随机抽样程序引入的监督误导。我们在CIFAR、Tiny ImageNet和ImageNet数据集下的实验表明，我们的方案不仅在分类精度方面优于当前最先进的增强策略，而且在应对数据损坏和对象遮挡等压力条件方面也优于现有的增强策略。摘要：The Mixup scheme suggests mixing a pair of samples to create an augmented training sample and has gained considerable attention recently for improving the generalizability of neural networks. A straightforward and widely used extension of Mixup is to combine with regional dropout-like methods: removing random patches from a sample and replacing it with the features from another sample. Albeit their simplicity and effectiveness, these methods are prone to create harmful samples due to their randomness. To address this issue, 'maximum saliency' strategies were recently proposed: they select only the most informative features to prevent such a phenomenon. However, they now suffer from lack of sample diversification as they always deterministically select regions with maximum saliency, injecting bias into the augmented data. In this paper, we present, a novel, yet simple Mixup-variant that captures the best of both worlds. Our idea is two-fold. By stochastically sampling the features and 'grafting' them onto another sample, our method effectively generates diverse yet meaningful samples. Its second ingredient is to produce the label of the grafted sample by mixing the labels in a saliency-calibrated fashion, which rectifies supervision misguidance introduced by the random sampling procedure. Our experiments under CIFAR, Tiny-ImageNet, and ImageNet datasets show that our scheme outperforms the current state-of-the-art augmentation strategies not only in terms of classification accuracy, but is also superior in coping under stress conditions such as data corruption and object occlusion.

【10】 δ-SAM: Sharpness-Aware Minimization with Dynamic Reweighting 标题：δ-SAM：动态加权的清晰度感知最小化链接：https://arxiv.org/abs/2112.08772

作者：Wenxuan Zhou,Muhao Chen 摘要：深层神经网络往往参数化过度，不易实现模型泛化。对抗性训练通过在对抗性选择扰动的基础上调整损失的变化，在提高泛化能力方面显示出了有效性。最近提出的锐度感知最小化（SAM）算法采用对抗性权重扰动，鼓励模型收敛到平坦的极小值。不幸的是，由于计算成本的增加，对抗性权重扰动只能有效地近似于每个批次，而不是每个实例，从而导致性能下降。在本文中，我们提出在每个批次中动态重加权的扰动，其中无防护实例是上加权的，可以作为每个实例扰动的更好近似。我们提出了具有动态重加权的锐度感知最小化（{delta}-SAM），它通过有效的保证度估计实现了这一思想。GLUE基准测试的实验证明了{delta}-SAM的有效性。摘要：Deep neural networks are often overparameterized and may not easily achieve model generalization. Adversarial training has shown effectiveness in improving generalization by regularizing the change of loss on top of adversarially chosen perturbations. The recently proposed sharpness-aware minimization (SAM) algorithm adopts adversarial weight perturbation, encouraging the model to converging to a flat minima. Unfortunately, due to increased computational cost, adversarial weight perturbation can only be efficiently approximated per-batch instead of per-instance, leading to degraded performance. In this paper, we propose that dynamically reweighted perturbation within each batch, where unguarded instances are up-weighted, can serve as a better approximation to per-instance perturbation. We propose sharpness-aware minimization with dynamic reweighting ({delta}-SAM), which realizes the idea with efficient guardedness estimation. Experiments on the GLUE benchmark demonstrate the effectiveness of {delta}-SAM.

【11】 IsometricMT: Neural Machine Translation for Automatic Dubbing 标题：IsometricMT：自动配音的神经机器翻译链接：https://arxiv.org/abs/2112.08682

作者：Surafel M. Lakew,Yogesh Virkar,Prashant Mathur,Marcello Federico 备注：Submitted to IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022 摘要：自动配音（AD）是翻译应符合给定长度模板的用例之一，以实现源语音和目标语音之间的同步。对于神经机器翻译（MT），产生接近源长度的长度的翻译（例如在字符计数中的+- 10%以内），同时保持质量是一项具有挑战性的任务。控制NMT输出长度会降低翻译质量，这通常通过两步方法来缓解，即生成n个最佳假设，然后根据长度和质量对其重新排序。这项工作引入了一种自学习方法，允许Transformer模型直接学习生成与源长度紧密匹配的输出，简称等轴测MT。特别是，我们的等轴测MT方法不需要生成多个假设或任何辅助评分函数。我们报告了四对语言（英语-法语、意大利语、德语、西班牙语）的结果，并基于TED Talk数据提供了一个公开的基准。自动和手动评估表明，我们的自学习方法与更复杂的等距MT方法相媲美。摘要：Automatic dubbing (AD) is among the use cases where translations should fit a given length template in order to achieve synchronicity between source and target speech. For neural machine translation (MT), generating translations of length close to the source length (e.g. within +-10% in character count), while preserving quality is a challenging task. Controlling NMT output length comes at a cost to translation quality which is usually mitigated with a two step approach of generation of n-best hypotheses and then re-ranking them based on length and quality. This work, introduces a self-learning approach that allows a transformer model to directly learn to generate outputs that closely match the source length, in short isometric MT. In particular, our approach for isometric MT does not require to generate multiple hypotheses nor any auxiliary scoring function. We report results on four language pairs (English - French, Italian, German, Spanish) with a publicly available benchmark based on TED Talk data. Both automatic and manual evaluations show that our self-learning approach to performs on par with more complex isometric MT approaches.

【12】 Amortized Noisy Channel Neural Machine Translation 标题：折算噪声通道神经机器翻译链接：https://arxiv.org/abs/2112.08670

作者：Richard Yuanzhe Pang,He He,Kyunghyun Cho 摘要：噪声信道模型在神经机器翻译（NMT）中尤其有效。然而，最近的方法，如“波束搜索和重行”（BSR）在推理过程中会产生大量的计算开销，使得实际应用不可行。我们的目标是建立一个摊销噪声信道NMT模型，以便贪婪地从中解码将生成与使用BSR生成的翻译相同的最大回报的翻译。我们尝试了三种方法：知识提炼、一步偏差模仿学习和Q学习。第一种方法是从伪语料库中获取带噪信道信号，后两种方法是直接针对带噪信道进行优化。这三种方法都将推理速度提高了1-2个数量级。对于所有三种方法，生成的翻译无法获得与BSR相当的回报，但BLEU近似的翻译质量与BSR生成的翻译质量相似。摘要：Noisy channel models have been especially effective in neural machine translation (NMT). However, recent approaches like "beam search and rerank" (BSR) incur significant computation overhead during inference, making real-world application infeasible. We aim to build an amortized noisy channel NMT model such that greedily decoding from it would generate translations that maximize the same reward as translations generated using BSR. We attempt three approaches: knowledge distillation, 1-step-deviation imitation learning, and Q learning. The first approach obtains the noisy channel signal from a pseudo-corpus, and the latter two approaches aim to optimize toward a noisy-channel MT reward directly. All three approaches speed up inference by 1-2 orders of magnitude. For all three approaches, the generated translations fail to achieve rewards comparable to BSR, but the translation quality approximated by BLEU is similar to the quality of BSR-produced translations.

【13】 Visualizing the Loss Landscape of Winning Lottery Tickets 标题：彩票中奖损失景观的可视化链接：https://arxiv.org/abs/2112.08538

作者：Robert Bain 备注：7 pages, 7 figures, 1 algorithm/pseudocode 摘要：深层神经网络的潜在损失对其训练有很大影响，但由于计算上的限制，人们主要从理论上对其进行研究。这项工作大大减少了计算此类损失情况所需的时间，并将其用于研究通过迭代幅度修剪发现的中奖彩票。我们还分享了与先前声称的某些损失景观投影方法与模型可训练性和泛化误差之间的相关性相矛盾的结果。摘要：The underlying loss landscapes of deep neural networks have a great impact on their training, but they have mainly been studied theoretically due to computational constraints. This work vastly reduces the time required to compute such loss landscapes, and uses them to study winning lottery tickets found via iterative magnitude pruning. We also share results that contradict previously claimed correlations between certain loss landscape projection methods and model trainability and generalization error.

【14】 ELight: Enabling Efficient Photonic In-Memory Neurocomputing with Life Enhancement 标题：ELIGT：通过生命增强实现高效的光子内存中神经计算链接：https://arxiv.org/abs/2112.08512

作者：Hanqing Zhu,Jiaqi Gu,Chenghao Feng,Mingjie Liu,Zixuan Jiang,Ray T. Chen,David Z. Pan 备注：7 pages, 8 figures, accepted by ASPDAC 2022 摘要：随着光学相变材料（PCM）的最新进展，光子记忆神经计算在光学神经网络（ONN）设计中显示了其优越性，具有接近零的静态功耗、光延迟时间和紧凑的占地面积。然而，由于单核规模有限，光子张量核需要大量的硬件重用来实现大矩阵乘法。由此产生的大量PCM写入导致严重的动态功率，并以有限的写入耐久性压倒脆弱的PCM。在这项工作中，我们提出了一个协同优化框架ELight，以最小化高效可靠的光内存神经计算的总体写入工作。我们首先提出写感知训练来鼓励权重块之间的相似性，并将其与训练后优化方法相结合，通过消除冗余写操作来减少编程工作量。实验表明，ELight可以实现写入总数和动态功耗减少20倍以上，并且具有相当的精度。在我们的关注下，光子记忆神经计算将朝着在机器学习中的可行应用迈进，并保持准确性、数量级、更长的使用寿命和更低的编程能量。摘要：With the recent advances in optical phase change material (PCM), photonic in-memory neurocomputing has demonstrated its superiority in optical neural network (ONN) designs with near-zero static power consumption, time-of-light latency, and compact footprint. However, photonic tensor cores require massive hardware reuse to implement large matrix multiplication due to the limited single-core scale. The resultant large number of PCM writes leads to serious dynamic power and overwhelms the fragile PCM with limited write endurance. In this work, we propose a synergistic optimization framework, ELight, to minimize the overall write efforts for efficient and reliable optical in-memory neurocomputing. We first propose write-aware training to encourage the similarity among weight blocks, and combine it with a post-training optimization method to reduce programming efforts by eliminating redundant writes. Experiments show that ELight can achieve over 20X reduction in the total number of writes and dynamic power with comparable accuracy. With our ELight, photonic in-memory neurocomputing will step forward towards viable applications in machine learning with preserved accuracy, order-of-magnitude longer lifetime, and lower programming energy.

【15】 The Need for Ethical, Responsible, and Trustworthy Artificial Intelligence for Environmental Sciences 标题：环境科学对道德的、负责任的和值得信赖的人工智能的需求链接：https://arxiv.org/abs/2112.08453

作者：Amy McGovern,Imme Ebert-Uphoff,David John Gagne II,Ann Bostrom 摘要：鉴于人工智能（AI）和机器学习（ML）方法在环境科学各个方面的应用日益广泛，我们必须开始讨论AI的道德和负责任的使用。事实上，从人工智能引入的其他领域可以学到很多东西，这些领域往往是出于好意，但往往会导致意外的社会后果，如刑事司法系统中的硬编码种族偏见或通过金融系统加剧经济不平等。一个常见的误解是，在使用人工智能时，环境科学不会受到这些意外后果的影响，因为大多数数据来自观测，人工智能算法基于数学公式，而数学公式通常被视为客观的。在本文中，我们认为情况正好相反。通过具体的例子，我们展示了人工智能在环境科学中引入类似结果的许多方法。本文将促进这方面的讨论和研究工作。作为一个社区，我们应该避免通过引入人工智能在其他领域重复任何可预见的错误。事实上，如果采取适当的预防措施，人工智能可以成为帮助{减少}气候和环境不公的一个伟大工具。我们主要关注天气和气候的例子，但结论广泛应用于环境科学。摘要：Given the growing use of Artificial Intelligence (AI) and machine learning (ML) methods across all aspects of environmental sciences, it is imperative that we initiate a discussion about the ethical and responsible use of AI. In fact, much can be learned from other domains where AI was introduced, often with the best of intentions, yet often led to unintended societal consequences, such as hard coding racial bias in the criminal justice system or increasing economic inequality through the financial system. A common misconception is that the environmental sciences are immune to such unintended consequences when AI is being used, as most data come from observations, and AI algorithms are based on mathematical formulas, which are often seen as objective. In this article, we argue the opposite can be the case. Using specific examples, we demonstrate many ways in which the use of AI can introduce similar consequences in the environmental sciences. This article will stimulate discussion and research efforts in this direction. As a community, we should avoid repeating any foreseeable mistakes made in other domains through the introduction of AI. In fact, with proper precautions, AI can be a great tool to help {it reduce} climate and environmental injustice. We primarily focus on weather and climate examples but the conclusions apply broadly across the environmental sciences.

【16】 Programmatic Reward Design by Example 标题：基于实例的程序性奖励设计链接：https://arxiv.org/abs/2112.08438

作者：Weichao Zhou,Wenchao Li 摘要：奖励设计是强化学习中的一个基本问题。错误指定或设计不当的奖励可能会导致低样本效率和不良行为。在本文中，我们提出了 ext{程序奖励设计}的思想，即在RL环境中使用程序指定奖励功能。程序允许人类工程师以结构化和可解释的方式表达子目标和复杂任务场景。然而，程序性奖励设计的挑战在于，尽管人类可以提供高层次的结构，但正确设置低层次的细节，例如为特定子任务设置适当数量的奖励，仍然很困难。本文的主要贡献是一个概率框架，它可以从专家演示中推断出最佳候选程序奖励函数。受最近生成性对抗方法的启发，我们的框架{搜索最可能的程序性奖励函数，在该函数下，最佳生成的轨迹无法与演示的轨迹区分}。实验结果表明，使用该框架学习的程序性奖励函数可以显著优于使用现有奖励学习算法学习的程序性奖励函数，并使RL代理能够在高度复杂的任务上实现最先进的性能。摘要：Reward design is a fundamental problem in reinforcement learning (RL). A misspecified or poorly designed reward can result in low sample efficiency and undesired behaviors. In this paper, we propose the idea of extit{programmatic reward design}, i.e. using programs to specify the reward functions in RL environments. Programs allow human engineers to express sub-goals and complex task scenarios in a structured and interpretable way. The challenge of programmatic reward design, however, is that while humans can provide the high-level structures, properly setting the low-level details, such as the right amount of reward for a specific sub-task, remains difficult. A major contribution of this paper is a probabilistic framework that can infer the best candidate programmatic reward function from expert demonstrations. Inspired by recent generative-adversarial approaches, our framework {searches for the most likely programmatic reward function under which the optimally generated trajectories cannot be differentiated from the demonstrated trajectories}. Experimental results show that programmatic reward functions learned using this framework can significantly outperform those learned using existing reward learning algorithms, and enable RL agents to achieve state-of-the-art performance on highly complex tasks.

【17】 A new locally linear embedding scheme in light of Hessian eigenmap 标题：一种新的基于Hessian特征映射的局部线性嵌入方案链接：https://arxiv.org/abs/2112.09086

作者：Liren Lin,Chih-Wei Chen 备注：13 pages 摘要：我们提供了Hessian局部线性嵌入（HLLE）的一种新解释，揭示了它本质上是实现局部线性嵌入（LLE）相同思想的一种变体。基于新的解释，可以进行实质性的简化，其中“黑森”的概念被相当任意的权重所取代。此外，我们通过数值例子表明，当目标空间的维数大于数据流形的维数时，HLLE可能产生类似于投影的结果，因此建议对流形维数进行进一步修改。结合所有观测结果，我们最终实现了一种新的LLE方法，称为切向LLE（TLLE）。它比HLLE更简单、更健壮。摘要：We provide a new interpretation of Hessian locally linear embedding (HLLE), revealing that it is essentially a variant way to implement the same idea of locally linear embedding (LLE). Based on the new interpretation, a substantial simplification can be made, in which the idea of "Hessian" is replaced by rather arbitrary weights. Moreover, we show by numerical examples that HLLE may produce projection-like results when the dimension of the target space is larger than that of the data manifold, and hence one further modification concerning the manifold dimension is suggested. Combining all the observations, we finally achieve a new LLE-type method, which is called tangential LLE (TLLE). It is simpler and more robust than HLLE.

【18】 Sensor Sampling Trade-Offs for Air Quality Monitoring With Low-Cost Sensors 标题：低成本传感器在空气质量监测中的传感器采样权衡链接：https://arxiv.org/abs/2112.09072

作者：Pau Ferrer-Cid,Julio Garcia-Calvete,Aina Main-Nadal,Zhe Ye,Jose M. Barcelo-Ordinas,Jorge Garcia-Vidal 备注：Submitted to journal, 12 pages, 22 figures 摘要：利用机器学习技术校准低成本传感器是目前广泛使用的一种方法。尽管在部署用于空气质量监测的低成本传感器方面仍有许多挑战有待解决，但低成本传感器已被证明与高精度仪器结合使用非常有用。因此，大多数研究都集中在使用机器学习的不同校准技术的应用上。然而，这些模型的成功应用取决于传感器获得的数据质量，从传感器采样和数据预处理到传感器本身校准的整个数据收集过程很少受到关注。在本文中，我们展示了传感器的主要采样参数，以及它们对基于机器学习的传感器校准结果质量的相应影响，以及它们对能耗的影响，从而展示了现有的权衡。最后，在一个实验节点上的结果显示了数据采样策略在对流层臭氧、二氧化氮和一氧化碳低成本传感器校准中的影响。具体来说，我们展示了最小化传感子系统占空比的采样策略如何在保持数据质量的同时降低功耗。摘要：The calibration of low-cost sensors using machine learning techniques is a methodology widely used nowadays. Although many challenges remain to be solved in the deployment of low-cost sensors for air quality monitoring, low-cost sensors have been shown to be useful in conjunction with high-precision instrumentation. Thus, most research is focused on the application of different calibration techniques using machine learning. Nevertheless, the successful application of these models depends on the quality of the data obtained by the sensors, and very little attention has been paid to the whole data gathering process, from sensor sampling and data pre-processing, to the calibration of the sensor itself. In this article, we show the main sensor sampling parameters, with their corresponding impact on the quality of the resulting machine learning-based sensor calibration and their impact on energy consumption, thus showing the existing trade-offs. Finally, the results on an experimental node show the impact of the data sampling strategy in the calibration of tropospheric ozone, nitrogen dioxide and nitrogen monoxide low-cost sensors. Specifically, we show how a sampling strategy that minimizes the duty cycle of the sensing subsystem can reduce power consumption while maintaining data quality.

猜你喜欢

Jease 2.6发布 Java开源内容框架
EasyCVR对接华为iVS订阅摄像机和用户变更请求接口介绍
JVM调优总结：反思
【技术种草】cdn+轻量服务器+hugo=让博客“云原生”一下
JVM调优总结：调优方法
前端面试【JavaScript】— typeof 是否能正确判断类型？
JVM调优总结：新一代的垃圾回收算法
前端面试【JavaScript】— instanceof 能否判断基本数据类型？
JVM调优总结：典型配置举例
前端面试【JavaScript】— 能不能手动实现一下 instanceof 的功能？
前端面试【JavaScript】— Object.is和=== 有什么区别？
JVM调优总结：分代垃圾回收详述
前端面试【JavaScript】— JS中类型转换有哪几种？
WPF开发入门尝试
前端面试【JavaScript】— == 和 ===有什么区别？
一个Java程序员对2011年的回顾
前端面试【JavaScript】— 对象转原始类型是根据什么流程运行的？
JVM调优总结：垃圾回收面临的问题
直接在代码里面对list集合进行分页
JVM调优总结：基本垃圾回收算法

zl程序教程

当前栏目

机器学习学术速递[12.17]

相关文章