zl程序教程

您现在的位置是:首页 >  IT要闻

当前栏目

人工智能学术速递[12.17]

2023-04-18 15:34:54 时间

cs.AI人工智能,共计68篇

【1】 HODOR: High-level Object Descriptors for Object Re-segmentation in Video Learned from Static Images 标题:HOODOR:基于静电学习的视频对象再分割高级对象描述符 链接:https://arxiv.org/abs/2112.09131

作者:Ali Athar,Jonathon Luiten,Alexander Hermans,Deva Ramanan,Bastian Leibe 摘要:现有最先进的视频对象分割(VOS)方法学习帧之间的低级别像素到像素的对应关系,以便在视频中传播对象遮罩。这需要大量密集注释的视频数据,注释成本很高,并且由于视频中的帧高度相关,因此在很大程度上是冗余的。有鉴于此,我们提出了HODOR:一种通过有效利用带注释的静态图像来理解对象外观和场景上下文来解决VOS的新方法。我们将图像帧中的对象实例和场景信息编码为健壮的高级描述符,然后使用这些描述符在不同帧中重新分割这些对象。因此,与未经视频注释训练的现有方法相比,HODOR在DAVIS和YouTube VOS基准上实现了最先进的性能。在没有任何架构修改的情况下,HODOR还可以利用循环一致性从单个带注释的视频帧周围的视频上下文中学习,而其他方法则依赖于密集的、时间一致的注释。 摘要:Existing state-of-the-art methods for Video Object Segmentation (VOS) learn low-level pixel-to-pixel correspondences between frames to propagate object masks across video. This requires a large amount of densely annotated video data, which is costly to annotate, and largely redundant since frames within a video are highly correlated. In light of this, we propose HODOR: a novel method that tackles VOS by effectively leveraging annotated static images for understanding object appearance and scene context. We encode object instances and scene information from an image frame into robust high-level descriptors which can then be used to re-segment those objects in different frames. As a result, HODOR achieves state-of-the-art performance on the DAVIS and YouTube-VOS benchmarks compared to existing methods trained without video annotations. Without any architectural modification, HODOR can also learn from video context around single annotated video frames by utilizing cyclic consistency, whereas other methods rely on dense, temporally consistent annotations.

【2】 ICON: Implicit Clothed humans Obtained from Normals 标题:图标:从Normals获得的隐含的衣着人类 链接:https://arxiv.org/abs/2112.09127

作者:Yuliang Xiu,Jinlong Yang,Dimitrios Tzionas,Michael J. Black 备注:21 pages, 18 figures, 7 tables. Project page: this https URL 摘要:当前用于学习逼真且可设置动画的3D服装化身的方法需要姿势3D扫描或具有仔细控制的用户姿势的2D图像。相比之下,我们的目标是仅从处于无约束姿势的人的二维图像中学习化身。给定一组图像,我们的方法从每个图像中估计出一个详细的3D表面,然后将其组合成一个可动画化的化身。隐式函数非常适合第一个任务,因为它们可以捕捉头发或衣服等细节。然而,当前的方法对不同的人体姿势并不鲁棒,并且通常会生成具有残缺肢体、缺失细节或非人体形状的3D曲面。问题是这些方法使用对全局姿态敏感的全局特征编码器。为了解决这个问题,我们提出了一个图标(“从法线获得的隐式穿衣服的人”),它使用了局部特征。ICON有两个主要模块,它们都利用SMPL(-X)车身模型。首先,ICON根据SMPL(-X)法线推断出详细的穿衣人体法线(前/后)。其次,可见性感知隐式曲面回归器生成人类居住场的iso曲面。重要的是,在推断时,反馈循环在使用推断的覆盖法线细化SMPL(-X)网格和细化法线之间交替进行。给定一个物体在不同姿势下的多个重建帧,我们使用SCANimate从中生成一个可动画化的化身。对AGORA和CAPE数据集的评估表明,即使训练数据非常有限,ICON在重建方面也优于最新技术。此外,它对分布外的样本(例如,在野生姿势/图像和帧外裁剪中)更具鲁棒性。ICON从原始图像向健壮的3D人体重建迈出了一步。这使得直接从视频中创建具有个性化和自然姿势相关的布料变形的化身成为可能。 摘要:Current methods for learning realistic and animatable 3D clothed avatars need either posed 3D scans or 2D images with carefully controlled user poses. In contrast, our goal is to learn the avatar from only 2D images of people in unconstrained poses. Given a set of images, our method estimates a detailed 3D surface from each image and then combines these into an animatable avatar. Implicit functions are well suited to the first task, as they can capture details like hair or clothes. Current methods, however, are not robust to varied human poses and often produce 3D surfaces with broken or disembodied limbs, missing details, or non-human shapes. The problem is that these methods use global feature encoders that are sensitive to global pose. To address this, we propose ICON ("Implicit Clothed humans Obtained from Normals"), which uses local features, instead. ICON has two main modules, both of which exploit the SMPL(-X) body model. First, ICON infers detailed clothed-human normals (front/back) conditioned on the SMPL(-X) normals. Second, a visibility-aware implicit surface regressor produces an iso-surface of a human occupancy field. Importantly, at inference time, a feedback loop alternates between refining the SMPL(-X) mesh using the inferred clothed normals and then refining the normals. Given multiple reconstructed frames of a subject in varied poses, we use SCANimate to produce an animatable avatar from them. Evaluation on the AGORA and CAPE datasets shows that ICON outperforms the state of the art in reconstruction, even with heavily limited training data. Additionally, it is much more robust to out-of-distribution samples, e.g., in-the-wild poses/images and out-of-frame cropping. ICON takes a step towards robust 3D clothed human reconstruction from in-the-wild images. This enables creating avatars directly from video with personalized and natural pose-dependent cloth deformation.

【3】 Human Hands as Probes for Interactive Object Understanding 标题:人的手作为交互式物体理解的探针 链接:https://arxiv.org/abs/2112.09120

作者:Mohit Goyal,Sahil Modi,Rishabh Goyal,Saurabh Gupta 备注:Project website at this https URL 摘要:交互式对象理解,或者说我们可以对对象做什么,以及如何做,是计算机视觉的一个长期目标。在本文中,我们通过在以自我为中心的视频中观察人类的手来解决这个问题。我们证明了观察人的手与什么相互作用以及如何提供相关数据和必要的监督。注意手,容易定位和稳定活动对象进行学习,并揭示与对象发生交互的位置。通过分析手,我们可以了解我们可以对物体做什么,以及如何处理。我们将这些基本原则应用于EPIC-KITCHENS数据集,并通过观察以自我为中心的视频中的手,成功地学习了状态敏感特征和对象启示(交互区域和提供的抓握)。 摘要:Interactive object understanding, or what we can do to objects and how is a long-standing goal of computer vision. In this paper, we tackle this problem through observation of human hands in in-the-wild egocentric videos. We demonstrate that observation of what human hands interact with and how can provide both the relevant data and the necessary supervision. Attending to hands, readily localizes and stabilizes active objects for learning and reveals places where interactions with objects occur. Analyzing the hands shows what we can do to objects and how. We apply these basic principles on the EPIC-KITCHENS dataset, and successfully learn state-sensitive features, and object affordances (regions of interaction and afforded grasps), purely by observing hands in egocentric videos.

【4】 Towards Unsupervised Dense Information Retrieval with Contrastive Learning 标题:基于对比学习的无监督密集信息检索 链接:https://arxiv.org/abs/2112.09118

作者:Gautier Izacard,Mathilde Caron,Lucas Hosseini,Sebastian Riedel,Piotr Bojanowski,Armand Joulin,Edouard Grave 摘要:信息检索是自然语言处理中的一个重要组成部分,用于回答问题和检查事实等知识密集型任务。最近,信息检索出现了基于神经网络的密集检索器,作为基于术语频率的经典稀疏方法的替代方法。这些模型已经在数据集和任务上获得了最先进的结果,在这些数据集和任务中,可以使用大型训练集。然而,在没有训练数据的情况下,它们不能很好地转移到新的领域或应用程序中,并且通常比不受监督的术语频率方法(如BM25)表现更好。因此,一个自然的问题是,是否有可能在没有监督的情况下训练密集的猎犬。在这项工作中,我们探讨了对比学习作为一种训练无监督密集检索器的方法的局限性,并表明它能带来强大的检索性能。更准确地说,我们在BEIR基准上显示,我们的模型在15个数据集中的11个数据集上优于BM25。此外,当数千个示例可用时,我们表明,与BM25相比,在这些示例上微调我们的模型会带来很大的改进。最后,当在MS-MARCO数据集上进行微调之前用作预训练时,我们的技术在BEIR基准上获得了最先进的结果。 摘要:Information retrieval is an important component in natural language processing, for knowledge intensive tasks such as question answering and fact checking. Recently, information retrieval has seen the emergence of dense retrievers, based on neural networks, as an alternative to classical sparse methods based on term-frequency. These models have obtained state-of-the-art results on datasets and tasks where large training sets are available. However, they do not transfer well to new domains or applications with no training data, and are often outperformed by term-frequency methods such as BM25 which are not supervised. Thus, a natural question is whether it is possible to train dense retrievers without supervision. In this work, we explore the limits of contrastive learning as a way to train unsupervised dense retrievers, and show that it leads to strong retrieval performance. More precisely, we show on the BEIR benchmark that our model outperforms BM25 on 11 out of 15 datasets. Furthermore, when a few thousands examples are available, we show that fine-tuning our model on these leads to strong improvements compared to BM25. Finally, when used as pre-training before fine-tuning on the MS-MARCO dataset, our technique obtains state-of-the-art results on the BEIR benchmark.

【5】 RegionCLIP: Region-based Language-Image Pretraining 标题:RegionCLIP:基于区域的语言图像预训练 链接:https://arxiv.org/abs/2112.09106

作者:Yiwu Zhong,Jianwei Yang,Pengchuan Zhang,Chunyuan Li,Noel Codella,Liunian Harold Li,Luowei Zhou,Xiyang Dai,Lu Yuan,Yin Li,Jianfeng Gao 备注:Technical report 摘要:使用图像-文本对的对比语言图像预训练(CLIP)在Zero-Shot和迁移学习环境下的图像分类方面都取得了令人印象深刻的结果。然而,我们发现,直接应用此类模型识别图像区域进行目标检测会导致性能低下,因为域转移:剪辑被训练为将图像作为一个整体与文本描述相匹配,而没有捕获图像区域和文本跨度之间的细粒度对齐。为了缓解这个问题,我们提出了一种称为RegionCLIP的新方法,该方法显著扩展了CLIP以学习区域级视觉表示,从而实现图像区域和文本概念之间的细粒度对齐。我们的方法利用剪辑模型将图像区域与模板标题进行匹配,然后对模型进行预训练,以便在特征空间中对齐这些区域文本对。当将我们的预训练模型转换为开放词汇表对象检测任务时,我们的方法在COCO和LVIS数据集上的新类别分别显著优于最新的3.8 AP50和2.2 AP。此外,学习到的区域表示支持Zero-Shot推断用于目标检测,在COCO和LVIS数据集上都显示了有希望的结果。我们的代码可在https://github.com/microsoft/RegionCLIP. 摘要:Contrastive language-image pretraining (CLIP) using image-text pairs has achieved impressive results on image classification in both zero-shot and transfer learning settings. However, we show that directly applying such models to recognize image regions for object detection leads to poor performance due to a domain shift: CLIP was trained to match an image as a whole to a text description, without capturing the fine-grained alignment between image regions and text spans. To mitigate this issue, we propose a new method called RegionCLIP that significantly extends CLIP to learn region-level visual representations, thus enabling fine-grained alignment between image regions and textual concepts. Our method leverages a CLIP model to match image regions with template captions and then pretrains our model to align these region-text pairs in the feature space. When transferring our pretrained model to the open-vocabulary object detection tasks, our method significantly outperforms the state of the art by 3.8 AP50 and 2.2 AP for novel categories on COCO and LVIS datasets, respectively. Moreoever, the learned region representations support zero-shot inference for object detection, showing promising results on both COCO and LVIS datasets. Our code is available at https://github.com/microsoft/RegionCLIP.

【6】 Learning and Analyzing Generation Order for Undirected Sequence Models 标题:无向序列模型的生成顺序学习与分析 链接:https://arxiv.org/abs/2112.09097

作者:Yichen Jiang,Mohit Bansal 备注:EMNLP 2021 Findings (12 pages) 摘要:无向神经序列模型的性能与机器翻译任务中从左到右单调生成的最先进的有向序列模型相当。在这项工作中,我们训练了一种策略,该策略通过强化学习学习预先训练的无向翻译模型的生成顺序。我们表明,在WMT'14德语-英语翻译任务中,根据我们的学习顺序解码的翻译比从左到右解码的输出或根据Mansimov等人(2019)的学习顺序解码的输出获得更高的BLEU分数。在De-En、WMT'16英罗曼语和WMT'21英汉语翻译任务的最大源和目标长度为30的示例中,我们的学习顺序在六分之四的任务上优于所有启发式生成顺序。接下来,我们通过定性和定量分析仔细分析所学的订单模式。我们表明,我们的政策通常遵循从外部到内部的顺序,首先预测最左侧和最右侧的位置,然后向中间移动,同时在开始时跳过不太重要的单词。此外,该策略通常在连续步骤中预测单个句法成分结构的位置。我们相信我们的发现可以提供更多关于无向生成模型机制的见解,并鼓励在这一方向上进行进一步的研究。我们的代码在https://github.com/jiangycTarheel/undirected-generation 摘要:Undirected neural sequence models have achieved performance competitive with the state-of-the-art directed sequence models that generate monotonically from left to right in machine translation tasks. In this work, we train a policy that learns the generation order for a pre-trained, undirected translation model via reinforcement learning. We show that the translations decoded by our learned orders achieve higher BLEU scores than the outputs decoded from left to right or decoded by the learned order from Mansimov et al. (2019) on the WMT'14 German-English translation task. On examples with a maximum source and target length of 30 from De-En, WMT'16 English-Romanian, and WMT'21 English-Chinese translation tasks, our learned order outperforms all heuristic generation orders on four out of six tasks. We next carefully analyze the learned order patterns via qualitative and quantitative analysis. We show that our policy generally follows an outer-to-inner order, predicting the left-most and right-most positions first, and then moving toward the middle while skipping less important words at the beginning. Furthermore, the policy usually predicts positions for a single syntactic constituent structure in consecutive steps. We believe our findings could provide more insights on the mechanism of undirected generation models and encourage further research in this direction. Our code is publicly available at https://github.com/jiangycTarheel/undirected-generation

【7】 CrossLoc: Scalable Aerial Localization Assisted by Multimodal Synthetic Data 标题:CrossLoc:多模态合成数据辅助的可伸缩空中定位 链接:https://arxiv.org/abs/2112.09081

作者:Qi Yan,Jianhao Zheng,Simon Reding,Shanci Li,Iordan Doytchinov 备注:Preprint. Our code is available at this https URL 摘要:我们提出了一个视觉定位系统,学习估计摄像机姿态在现实世界中的帮助下,合成数据。尽管近年来取得了重大进展,但大多数基于学习的视觉定位方法都只针对单个领域,需要地理标记图像的密集数据库才能正常工作。为了缓解数据稀缺问题并提高神经定位模型的可伸缩性,我们引入了TOPO DataGen,这是一种多功能的合成数据生成工具,可在真实世界和虚拟世界之间平滑地进行遍历,它依赖于地理摄像机视点。提出了新的大规模模拟真实基准数据集,以展示和评估所述合成数据的效用。我们的实验表明,合成数据通常会提高神经网络在真实数据上的性能。此外,我们还介绍了CrossLoc,一种用于姿态估计的跨模态视觉表示学习方法,该方法通过自我监督充分利用场景坐标地面真实性。在没有任何额外数据的情况下,CrossLoc显著优于最先进的方法,并实现了更高的实际数据采样效率。我们的代码可在https://github.com/TOPO-EPFL/CrossLoc. 摘要:We present a visual localization system that learns to estimate camera poses in the real world with the help of synthetic data. Despite significant progress in recent years, most learning-based approaches to visual localization target at a single domain and require a dense database of geo-tagged images to function well. To mitigate the data scarcity issue and improve the scalability of the neural localization models, we introduce TOPO-DataGen, a versatile synthetic data generation tool that traverses smoothly between the real and virtual world, hinged on the geographic camera viewpoint. New large-scale sim-to-real benchmark datasets are proposed to showcase and evaluate the utility of the said synthetic data. Our experiments reveal that synthetic data generically enhances the neural network performance on real data. Furthermore, we introduce CrossLoc, a cross-modal visual representation learning approach to pose estimation that makes full use of the scene coordinate ground truth via self-supervision. Without any extra data, CrossLoc significantly outperforms the state-of-the-art methods and achieves substantially higher real-data sample efficiency. Our code is available at https://github.com/TOPO-EPFL/CrossLoc.

【8】 SanMove: Next Location Recommendation via Self-Attention Network 标题:SanMove:自助网的下一个位置推荐 链接:https://arxiv.org/abs/2112.09076

作者:Huifeng Li,Bin Wang,Sulei Zhu,Yanyan Xu 摘要:目前,下一个位置推荐在基于位置的社交网络应用和服务中起着至关重要的作用。虽然已经提出了许多方法来解决这个问题,但到目前为止,有三个重要的挑战尚未得到很好的解决:(1)大多数现有方法都基于递归网络,由于不允许完全并行,因此训练长序列非常耗时;(2) 个性化偏好通常没有得到合理考虑;(3) 现有的方法很少系统地研究如何有效地利用轨迹数据中的各种辅助信息(如用户ID和时间戳)以及非连续位置之间的时空关系。为了应对上述挑战,我们提出了一种新的方法SanMove,一种基于自关注网络的模型,通过捕捉用户的长期和短期移动模式来预测下一个位置。具体来说,SanMove引入了一个长期偏好学习模块,它使用一个自我关注模块来捕获用户的长期移动模式,该模式可以表示用户的个性化位置偏好。同时,SanMove使用时空引导的非侵入性自我注意(STNOVA)来利用辅助信息来学习短期偏好。我们使用两个真实数据集对SanMove进行评估,并证明SanMove不仅比最先进的基于RNN的预测模型更快,而且在下一个位置预测方面也优于基线。 摘要:Currently, next location recommendation plays a vital role in location-based social network applications and services. Although many methods have been proposed to solve this problem, three important challenges have not been well addressed so far: (1) most existing methods are based on recurrent network, which is time-consuming to train long sequences due to not allowing for full parallelism; (2) personalized preferences generally are not considered reasonably; (3) existing methods rarely systematically studied how to efficiently utilize various auxiliary information (e.g., user ID and timestamp) in trajectory data and the spatio-temporal relations among non-consecutive locations. To address the above challenges, we propose a novel method named SanMove, a self-attention network based model, to predict the next location via capturing the long- and short-term mobility patterns of users. Specifically, SanMove introduces a long-term preference learning module, and it uses a self-attention module to capture the users long-term mobility pattern which can represent personalized location preferences of users. Meanwhile, SanMove uses a spatial-temporal guided non-invasive self-attention (STNOVA) to exploit auxiliary information to learn short-term preferences. We evaluate SanMove with two real-world datasets, and demonstrate SanMove is not only faster than the state-of-the-art RNN-based predict model but also outperforms the baselines for next location prediction.

【9】 Progressive Graph Convolution Network for EEG Emotion Recognition 标题:渐进图卷积网络在脑电情感识别中的应用 链接:https://arxiv.org/abs/2112.09069

作者:Yijin Zhou,Fu Li,Yang Li,Youshuo Ji,Guangming Shi,Wenming Zheng,Lijian Zhang,Yuanfang Chen,Rui Cheng 备注:11 pages, 5 figures 摘要:神经科学领域的研究揭示了情绪模式与大脑功能区域之间的关系,表明不同大脑区域之间的动态关系是影响通过脑电图(EEG)确定的情绪识别的关键因素。此外,在脑电情感识别中,我们可以观察到,基于相同的脑电数据,粗粒度情感之间比细粒度情感之间存在更清晰的边界;这表明大的粗粒度和小的细粒度情感变化同时存在。因此,从粗粒度到细粒度的渐进分类过程可能有助于脑电情感识别。因此,在本研究中,我们提出了一种渐进图卷积网络(PGCN),用于捕捉EEG情绪信号中的这一固有特征,并逐步学习区分性EEG特征。为了适应不同的脑电模式,我们构建了一个双图模块来描述不同脑电通道之间的内在关系,包含了神经科学研究中大脑区域的动态功能连接和静态空间接近信息。此外,出于对粗粒度和细粒度情绪之间关系的观察,我们采用了一个双头模块,该模块使PGCN能够逐步学习更多区分性EEG特征,从粗粒度(容易)到细粒度类别(困难),参考情绪的层次特征。为了验证我们模型的性能,在两个公共数据集:SEED-IV和多模态生理情绪数据库(MPED)上进行了大量实验。 摘要:Studies in the area of neuroscience have revealed the relationship between emotional patterns and brain functional regions, demonstrating that dynamic relationships between different brain regions are an essential factor affecting emotion recognition determined through electroencephalography (EEG). Moreover, in EEG emotion recognition, we can observe that clearer boundaries exist between coarse-grained emotions than those between fine-grained emotions, based on the same EEG data; this indicates the concurrence of large coarse- and small fine-grained emotion variations. Thus, the progressive classification process from coarse- to fine-grained categories may be helpful for EEG emotion recognition. Consequently, in this study, we propose a progressive graph convolution network (PGCN) for capturing this inherent characteristic in EEG emotional signals and progressively learning the discriminative EEG features. To fit different EEG patterns, we constructed a dual-graph module to characterize the intrinsic relationship between different EEG channels, containing the dynamic functional connections and static spatial proximity information of brain regions from neuroscience research. Moreover, motivated by the observation of the relationship between coarse- and fine-grained emotions, we adopt a dual-head module that enables the PGCN to progressively learn more discriminative EEG features, from coarse-grained (easy) to fine-grained categories (difficult), referring to the hierarchical characteristic of emotion. To verify the performance of our model, extensive experiments were conducted on two public datasets: SEED-IV and multi-modal physiological emotion database (MPED).

【10】 Solving Inverse Problems with NerfGANs 标题:用神经网络求解反问题 链接:https://arxiv.org/abs/2112.09061

作者:Giannis Daras,Wen-Sheng Chu,Abhishek Kumar,Dmitry Lagun,Alexandros G. Dimakis 备注:16 pages, 18 figures 摘要:我们介绍了一种新的框架,用于解决反问题使用NeRF风格的生成模型。我们感兴趣的是给定单个二维图像和已知摄像机参数的三维场景重建问题。我们表明,天真地优化潜在空间会导致伪影和糟糕的新视图渲染。我们将此问题归因于三维几何体中清晰的体积障碍物,并在新视图的渲染中变得可见。我们提出了一种新的辐射场正则化方法,以获得更好的三维曲面,并在单视图观测的情况下改进了新视图。我们的方法自然地扩展到一般的反问题,包括仅部分观察单个视图的修复。我们通过实验评估了我们的方法,在广泛的任务中实现了视觉改善和性能提升。与以前的先进技术相比,我们的方法实现了30-40\%$MSE减少和15-25\%$LPIPS损失减少。 摘要:We introduce a novel framework for solving inverse problems using NeRF-style generative models. We are interested in the problem of 3-D scene reconstruction given a single 2-D image and known camera parameters. We show that naively optimizing the latent space leads to artifacts and poor novel view rendering. We attribute this problem to volume obstructions that are clear in the 3-D geometry and become visible in the renderings of novel views. We propose a novel radiance field regularization method to obtain better 3-D surfaces and improved novel views given single view observations. Our method naturally extends to general inverse problems including inpainting where one observes only partially a single view. We experimentally evaluate our method, achieving visual improvements and performance boosts over the baselines in a wide range of tasks. Our method achieves $30-40\%$ MSE reduction and $15-25\%$ reduction in LPIPS loss compared to the previous state of the art.

【11】 Deep Reinforcement Learning Policies Learn Shared Adversarial Features Across MDPs 标题:深度强化学习策略学习跨MDP的共享对抗性特征 链接:https://arxiv.org/abs/2112.09025

作者:Ezgi Korkmaz 备注:Published in AAAI 2022 摘要:深度神经网络作为函数逼近器的使用,在强化学习算法和应用方面取得了显著的进展。然而,我们对决策边界几何学和神经策略的损失情况的了解仍然相当有限。在本文中,我们提出了一个框架来研究各州和MDP之间的决策边界和损失景观相似性。我们在Arcade学习环境中的各种游戏中进行实验,发现神经策略的高灵敏度方向在MDP中是相关的。我们认为,这些高灵敏度方向支持以下假设:强化学习代理的训练环境中共享非稳健特征。我们相信,我们的研究结果揭示了深度强化学习训练环境的基本属性,为构建健壮可靠的深度强化学习代理迈出了切实的一步。 摘要:The use of deep neural networks as function approximators has led to striking progress for reinforcement learning algorithms and applications. Yet the knowledge we have on decision boundary geometry and the loss landscape of neural policies is still quite limited. In this paper we propose a framework to investigate the decision boundary and loss landscape similarities across states and across MDPs. We conduct experiments in various games from Arcade Learning Environment, and discover that high sensitivity directions for neural policies are correlated across MDPs. We argue that these high sensitivity directions support the hypothesis that non-robust features are shared across training environments of reinforcement learning agents. We believe our results reveal fundamental properties of the environments used in deep reinforcement learning training, and represent a tangible step towards building robust and reliable deep reinforcement learning agents.

【12】 Centralizing State-Values in Dueling Networks for Multi-Robot Reinforcement Learning Mapless Navigation 标题:多机器人强化学习无人导航的决斗网络状态值集中 链接:https://arxiv.org/abs/2112.09012

作者:Enrico Marchesini,Alessandro Farinelli 备注:6 pages, 5 figures, 1 table. Accepted at IROS 2021 摘要:我们研究了流行的集中训练和分散执行(CTDE)模式下的多机器人mapless导航问题。当每个机器人考虑其路径而不与其他机器人明确共享观测值时,该问题具有挑战性,并可能导致深度强化学习(DRL)中的非平稳问题。典型的CTDE算法将联合行动价值函数分解为单独的行动价值函数,以利于合作并实现分散执行。这种因式分解涉及限制个体中出现新行为的约束(例如,单调性),因为每个代理都是从联合动作值开始训练的。相比之下,我们提出了一种新的CTDE体系结构,该体系结构使用集中式状态值网络来计算联合状态值,用于在基于值的代理更新中注入全局状态信息。因此,考虑到环境的整体状态,每个模型计算其权重的梯度更新。我们的想法遵循了决斗网络的观点,因为对关节状态值的单独估计既有提高样本效率的优势,又能为每个机器人提供全局状态是否有价值的信息。在2个、4个和8个机器人的机器人导航任务中进行的实验,证实了我们的方法比以前的CTDE方法(例如VDN、QMIX)具有更高的性能。 摘要:We study the problem of multi-robot mapless navigation in the popular Centralized Training and Decentralized Execution (CTDE) paradigm. This problem is challenging when each robot considers its path without explicitly sharing observations with other robots and can lead to non-stationary issues in Deep Reinforcement Learning (DRL). The typical CTDE algorithm factorizes the joint action-value function into individual ones, to favor cooperation and achieve decentralized execution. Such factorization involves constraints (e.g., monotonicity) that limit the emergence of novel behaviors in an individual as each agent is trained starting from a joint action-value. In contrast, we propose a novel architecture for CTDE that uses a centralized state-value network to compute a joint state-value, which is used to inject global state information in the value-based updates of the agents. Consequently, each model computes its gradient update for the weights, considering the overall state of the environment. Our idea follows the insights of Dueling Networks as a separate estimation of the joint state-value has both the advantage of improving sample efficiency, while providing each robot information whether the global state is (or is not) valuable. Experiments in a robotic navigation task with 2 4, and 8 robots, confirm the superior performance of our approach over prior CTDE methods (e.g., VDN, QMIX).

【13】 ADBCMM : Acronym Disambiguation by Building Counterfactuals and Multilingual Mixing 标题:ADBCMM:通过构建反事实和多语言混合来消除缩略语歧义 链接:https://arxiv.org/abs/2112.08991

作者:Yixuan Weng,Fei Xia,Bin Li,Xiusheng Huang,Shizhu He,Kang Liu,Jun Zhao 备注:SDU@AAAI-2022 摘要:科学文献通常包含大量首字母缩略词。消除这些首字母缩略词的歧义将有助于研究人员更好地理解文档中词汇的含义。过去,由于大量的英语文献资料,缩略语任务主要应用于英语文献中。然而,对于其他低资源语言,由于缺乏大量的注释数据,这项任务很难获得良好的性能,并且受到较少的关注。为了解决上述问题,本文提出了一种新的首字母缩略词消歧方法ADBCMM,该方法通过建立反事实和多语言混合来显著提高低资源语言的性能。具体而言,通过平衡低资源语言中的数据偏差,ADBCMM将能够改善数据集之外的测试性能。在里面SDU@AAAI-22-共享任务2:首字母缩略词消歧,提出的方法在法语和西班牙语中获得第一名。你可以在这里重复我们的结果https://github.com/WENGSYX/ADBCMM. 摘要:Scientific documents often contain a large number of acronyms. Disambiguation of these acronyms will help researchers better understand the meaning of vocabulary in the documents. In the past, thanks to large amounts of data from English literature, acronym task was mainly applied in English literature. However, for other low-resource languages, this task is difficult to obtain good performance and receives less attention due to the lack of large amount of annotation data. To address the above issue, this paper proposes an new method for acronym disambiguation, named as ADBCMM, which can significantly improve the performance of low-resource languages by building counterfactuals and multilingual mixing. Specifically, by balancing data bias in low-resource langauge, ADBCMM will able to improve the test performance outside the data set. In SDU@AAAI-22 - Shared Task 2: Acronym Disambiguation, the proposed method won first place in French and Spanish. You can repeat our results here https://github.com/WENGSYX/ADBCMM.

【14】 A molecular generative model with genetic algorithm and tree search for cancer samples 标题:基于遗传算法和树搜索的癌症样本分子生成模型 链接:https://arxiv.org/abs/2112.08959

作者:Sejin Park,Hyunju Lee 摘要:通过基于患者的基因特征对患者进行治疗,个性化药物有望最大限度地发挥预期的药物作用,并将副作用降至最低。因此,根据疾病的遗传特征来制备药物,特别是在抗癌药物的发现中,是非常重要的。然而,这是一个挑战,因为巨大的化学空间和癌症性质的变化需要大量的时间来寻找合适的分子。因此,在抗癌药物的从头分子设计中,需要一种考虑基因图谱的高效、快速的搜索方法。在此,我们提出了一种基于遗传算法和癌症样本树搜索的快速分子生成模型(FasterGTS)。FasterGTS由遗传算法和蒙特卡罗树搜索以及三个深层神经网络构成:监督学习、自训练和价值网络,它根据癌症样本的遗传图谱生成抗癌分子。与其他方法相比,Fastergs在有限的样本数量内生成了癌症样本特异性分子,这些分子具有癌症药物所需的一般化学性质。我们期望FasterGets有助于抗癌药物的产生。 摘要:Personalized medicine is expected to maximize the intended drug effects and minimize side effects by treating patients based on their genetic profiles. Thus, it is important to generate drugs based on the genetic profiles of diseases, especially in anticancer drug discovery. However, this is challenging because the vast chemical space and variations in cancer properties require a huge time resource to search for proper molecules. Therefore, an efficient and fast search method considering genetic profiles is required for de novo molecular design of anticancer drugs. Here, we propose a faster molecular generative model with genetic algorithm and tree search for cancer samples (FasterGTS). FasterGTS is constructed with a genetic algorithm and a Monte Carlo tree search with three deep neural networks: supervised learning, self-trained, and value networks, and it generates anticancer molecules based on the genetic profiles of a cancer sample. When compared to other methods, FasterGTS generated cancer sample-specific molecules with general chemical properties required for cancer drugs within the limited numbers of samplings. We expect that FasterGTS contributes to the anticancer drug generation.

【15】 MVSS-Net: Multi-View Multi-Scale Supervised Networks for Image Manipulation Detection 标题:MVSS-Net:用于图像篡改检测的多视点多尺度监督网络 链接:https://arxiv.org/abs/2112.08935

作者:Chengbo Dong,Xinru Chen,Ruohan Hu,Juan Cao,Xirong Li 备注:arXiv admin note: substantial text overlap with arXiv:2104.06832 摘要:图像操纵检测的关键研究问题是如何学习对新数据操纵敏感的可概括特征,同时具体防止真实图像上的假警报。目前的研究强调敏感性,而忽视了特异性。在本文中,我们通过多视图特征学习和多尺度监控来解决这两个问题。前者通过利用篡改区域周围的噪声分布和边界伪影,旨在学习语义不可知的特征,从而获得更一般化的特征。后者允许我们从依赖于语义分割损失的现有技术所考虑的非平凡的真实图像中学习。我们的思想是通过一个新的网络来实现的,我们称之为MVSS-Net及其增强版MVSS-Net++。在六个公共基准数据集上的综合实验证明了MVSS网络系列在像素级和图像级操作检测方面的可行性。 摘要:The key research question for image manipulation detection is how to learn generalizable features that are sensitive to manipulations in novel data, whilst specific to prevent false alarms on authentic images. Current research emphasizes the sensitivity, with the specificity mostly ignored. In this paper we address both aspects by multi-view feature learning and multi-scale supervision. By exploiting noise distribution and boundary artifacts surrounding tampered regions, the former aims to learn semantic-agnostic and thus more generalizable features. The latter allows us to learn from authentic images which are nontrivial to be taken into account by the prior art that relies on a semantic segmentation loss. Our thoughts are realized by a new network which we term MVSS-Net and its enhanced version MVSS-Net++. Comprehensive experiments on six public benchmark datasets justify the viability of the MVSS-Net series for both pixel-level and image-level manipulation detection.

【16】 Responsive parallelized architecture for deploying deep learning models in production environments 标题:用于在生产环境中部署深度学习模型的响应式并行体系结构 链接:https://arxiv.org/abs/2112.08933

作者:Nikhil Verma,Krishna Prasad 备注:20 Pages 摘要:招聘人员可以通过查看求职者的简历文件轻松地将求职者列入候选名单。非结构化文档CV显示候选组合和命名实体,列出详细信息。本研究的主要目的是设计并提出一个面向网络、高度响应的计算管道,该管道使用分层细化的标签注意网络系统地预测CV实体。 摘要:Recruiters can easily shortlist candidates for jobs via viewing their curriculum vitae document. Unstructured document CV beholds candidates portfolio and named entities listing details. The main aim of this study is to design and propose a web oriented, highly responsive, computational pipeline that systematically predicts CV entities using hierarchically refined label attention networks.

【17】 Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning 标题:从引导性游戏中学习:改进对抗性模仿学习探索性的一种有计划的分层方法 链接:https://arxiv.org/abs/2112.08932

作者:Trevor Ablett,Bryan Chan,Jonathan Kelly 备注:Accepted at the Neurips 2021 Deep Reinforcement Learning Workshop, Sydney, Australia 摘要:有效的探索仍然是一个重大的挑战,它阻碍了许多物理系统中强化学习的部署。对于具有连续和高维状态和动作空间的系统,如机器人,尤其如此。这种挑战在稀疏奖励设置中更加突出,在这种设置中,密集奖励设计所需的低级状态信息不可用。对抗性模仿学习(AIL)可以通过利用专家生成的最佳行为演示,并从本质上替代密集的奖励信息,部分克服这一障碍。不幸的是,专家演示的可用性并不一定能提高代理有效探索的能力,正如我们的经验所表明的那样,这可能导致学习效率低下或停滞。我们介绍了引导式游戏学习(LfGP),这是一个框架,在该框架中,除了一个主要任务外,我们还利用专家演示多个辅助任务。随后,使用分层模型通过修改的AIL过程学习每个任务奖励和策略,其中通过将不同任务组合在一起的调度器执行对所有任务的探索。这提供了许多好处:对于具有挑战性瓶颈转换的主要任务,学习效率得到了提高,专家数据在任务之间变得可重用,并且通过重用已学习的辅助任务模型实现转移学习成为可能。我们在一个具有挑战性的多任务机器人操作领域的实验结果表明,我们的方法优于有监督的模仿学习和最先进的AIL方法。代码可在https://github.com/utiasSTARS/lfgp. 摘要:Effective exploration continues to be a significant challenge that prevents the deployment of reinforcement learning for many physical systems. This is particularly true for systems with continuous and high-dimensional state and action spaces, such as robotic manipulators. The challenge is accentuated in the sparse rewards setting, where the low-level state information required for the design of dense rewards is unavailable. Adversarial imitation learning (AIL) can partially overcome this barrier by leveraging expert-generated demonstrations of optimal behaviour and providing, essentially, a replacement for dense reward information. Unfortunately, the availability of expert demonstrations does not necessarily improve an agent's capability to explore effectively and, as we empirically show, can lead to inefficient or stagnated learning. We present Learning from Guided Play (LfGP), a framework in which we leverage expert demonstrations of, in addition to a main task, multiple auxiliary tasks. Subsequently, a hierarchical model is used to learn each task reward and policy through a modified AIL procedure, in which exploration of all tasks is enforced via a scheduler composing different tasks together. This affords many benefits: learning efficiency is improved for main tasks with challenging bottleneck transitions, expert data becomes reusable between tasks, and transfer learning through the reuse of learned auxiliary task models becomes possible. Our experimental results in a challenging multitask robotic manipulation domain indicate that our method compares favourably to supervised imitation learning and to a state-of-the-art AIL method. Code is available at https://github.com/utiasSTARS/lfgp.

【18】 COVID-19 Electrocardiograms Classification using CNN Models 标题:基于CNN模型的冠状病毒心电图分类 链接:https://arxiv.org/abs/2112.08931

作者:Ismail Shahin,Ali Bou Nassif,Mohamed Bader Alsabek 备注:5 pages, 4 figures, accepted in the 14th International Conference on Developments in eSystems Engineering, 7-10 December, 2021 摘要:随着COVID-19的周期性上升和下降以及许多国家受到其影响的影响,全世界科学家、研究者和医生已经做了大量的工作。迫切需要及时干预,以应对该疾病的不合理传播。通过应用深度学习算法的基础知识,人工智能(AI)的实现为数字健康区做出了重大贡献。在2019冠状病毒疾病诊断中,提出了一种新的方法,即利用深度学习算法,特别是卷积神经网络(CNN)模型,利用心电图数据自动诊断COVID-19。该框架中使用了几个CNN模型,包括VGG16、VGG19、InceptionResnetv2、InceptionV3、Resnet50和Densenet201。VGG16模型优于其他模型,准确率为85.92%。我们的结果表明,与VGG16模型相比,其余模型的精度相对较低,这是由于所使用的数据集较小,此外,仅对VGG16模型使用网格搜索超参数优化方法。此外,我们的结果是预备性的,并且有可能通过进一步扩展数据集和采用合适的超参数优化技术来提高所有模型的准确性。 摘要:With the periodic rise and fall of COVID-19 and numerous countries being affected by its ramifications, there has been a tremendous amount of work that has been done by scientists, researchers, and doctors all over the world. Prompt intervention is keenly needed to tackle the unconscionable dissemination of the disease. The implementation of Artificial Intelligence (AI) has made a significant contribution to the digital health district by applying the fundamentals of deep learning algorithms. In this study, a novel approach is proposed to automatically diagnose the COVID-19 by the utilization of Electrocardiogram (ECG) data with the integration of deep learning algorithms, specifically the Convolutional Neural Network (CNN) models. Several CNN models have been utilized in this proposed framework, including VGG16, VGG19, InceptionResnetv2, InceptionV3, Resnet50, and Densenet201. The VGG16 model has outperformed the rest of the models, with an accuracy of 85.92%. Our results show a relatively low accuracy in the rest of the models compared to the VGG16 model, which is due to the small size of the utilized dataset, in addition to the exclusive utilization of the Grid search hyperparameters optimization approach for the VGG16 model only. Moreover, our results are preparatory, and there is a possibility to enhance the accuracy of all models by further expanding the dataset and adapting a suitable hyperparameters optimization technique.

【19】 Intelli-Paint: Towards Developing Human-like Painting Agents 标题:INTILI-PAINT:发展仿人涂饰剂 链接:https://arxiv.org/abs/2112.08930

作者:Jaskirat Singh,Cameron Smith,Jose Echevarria,Liang Zheng 摘要:生成设计良好的艺术品通常非常耗时,并且假定人类画家具有高度的熟练程度。为了促进人类的绘画过程,已经在教机器如何“像人类一样绘画”方面进行了大量的研究,然后使用经过训练的代理作为人类用户的绘画辅助工具。然而,当前这方面的研究通常依赖于基于网格的渐进式分割策略,其中代理将整个图像分割为连续的更精细网格,然后并行绘制每个网格。这不可避免地导致人工绘画序列,人类用户不容易理解。为了解决这个问题,我们提出了一种新的绘画方法,它可以学习生成输出画布,同时展示更人性化的绘画风格。建议的绘制管道Intelli Paint由1)渐进分层策略组成,该策略允许代理首先绘制自然背景场景表示,然后以渐进方式添加每个前景对象。2) 我们还介绍了一种新的顺序笔画引导策略,它可以帮助绘画代理以语义感知的方式在不同的图像区域之间转移注意力。3) 最后,我们提出了一种笔画规则化策略,该策略允许所需笔画总数减少约60-80%,而生成画布的质量没有任何明显差异。通过定量和定性结果,我们表明,生成的代理不仅提高了输出画布生成的效率,而且展示了更自然的绘画风格,这将更好地帮助人类用户通过数字艺术品表达他们的想法。 摘要:The generation of well-designed artwork is often quite time-consuming and assumes a high degree of proficiency on part of the human painter. In order to facilitate the human painting process, substantial research efforts have been made on teaching machines how to "paint like a human", and then using the trained agent as a painting assistant tool for human users. However, current research in this direction is often reliant on a progressive grid-based division strategy wherein the agent divides the overall image into successively finer grids, and then proceeds to paint each of them in parallel. This inevitably leads to artificial painting sequences which are not easily intelligible to human users. To address this, we propose a novel painting approach which learns to generate output canvases while exhibiting a more human-like painting style. The proposed painting pipeline Intelli-Paint consists of 1) a progressive layering strategy which allows the agent to first paint a natural background scene representation before adding in each of the foreground objects in a progressive fashion. 2) We also introduce a novel sequential brushstroke guidance strategy which helps the painting agent to shift its attention between different image regions in a semantic-aware manner. 3) Finally, we propose a brushstroke regularization strategy which allows for ~60-80% reduction in the total number of required brushstrokes without any perceivable differences in the quality of the generated canvases. Through both quantitative and qualitative results, we show that the resulting agents not only show enhanced efficiency in output canvas generation but also exhibit a more natural-looking painting style which would better assist human users express their ideas through digital artwork.

【20】 Inherently Explainable Reinforcement Learning in Natural Language 标题:自然语言中的内在可解释强化学习 链接:https://arxiv.org/abs/2112.08907

作者:Xiangyu Peng,Mark O. Riedl,Prithviraj Ammanabrolu 摘要:我们专注于创建一个内在可解释的强化学习代理的任务——通过在执行任务时大声思考并事后分析整个轨迹以产生因果解释,能够立即产生局部解释。这种层次结构可解释的强化学习代理(HEX-RL)在交互式小说、基于文本的游戏环境中运行,在这种环境中,代理使用文本自然语言感知并作用于世界。这些游戏通常被设计成具有长期依赖性的谜题或任务,在这些谜题或任务中,代理必须完成一系列动作才能成功——提供理想的环境来测试代理解释其动作的能力。我们的代理被设计为将可解释性作为一级公民对待,使用提取的基于符号知识图的状态表示,再加上层次图注意机制,该机制指向内部图表示中对行为选择影响最大的事实。实验表明,该代理在强基线的基础上提供了显著改进的解释,如通常不熟悉环境的人类参与者所评价的,同时也与最先进的任务性能相匹配。 摘要:We focus on the task of creating a reinforcement learning agent that is inherently explainable -- with the ability to produce immediate local explanations by thinking out loud while performing a task and analyzing entire trajectories post-hoc to produce causal explanations. This Hierarchically Explainable Reinforcement Learning agent (HEX-RL), operates in Interactive Fictions, text-based game environments in which an agent perceives and acts upon the world using textual natural language. These games are usually structured as puzzles or quests with long-term dependencies in which an agent must complete a sequence of actions to succeed -- providing ideal environments in which to test an agent's ability to explain its actions. Our agent is designed to treat explainability as a first-class citizen, using an extracted symbolic knowledge graph-based state representation coupled with a Hierarchical Graph Attention mechanism that points to the facts in the internal graph representation that most influenced the choice of actions. Experiments show that this agent provides significantly improved explanations over strong baselines, as rated by human participants generally unfamiliar with the environment, while also matching state-of-the-art task performance.

【21】 Graph Structure Learning with Variational Information Bottleneck 标题:具有变化信息瓶颈的图结构学习 链接:https://arxiv.org/abs/2112.08903

作者:Qingyun Sun,Jianxin Li,Hao Peng,Jia Wu,Xingcheng Fu,Cheng Ji,Philip S. Yu 备注:Accepted by AAAI 2022, Preprint version with Appendix 摘要:图形神经网络(GNNs)在广泛的应用中显示了良好的结果。大多数GNNs的实证研究直接将观测图作为输入,假设观测结构完美地描述了节点之间精确而完整的关系。然而,现实世界中的图形不可避免地存在噪声或不完整,这甚至可能加剧图形表示的质量。在这项工作中,我们从信息论的角度提出了一种新的变分信息瓶颈引导图结构学习框架VIB-GSL。VIB-GSL提出了图形结构学习的信息瓶颈(IB)原则,为挖掘底层任务相关关系提供了一个更加优雅和通用的框架。VIB-GSL学习信息丰富的压缩图结构,以提取特定下游任务的可操作信息。VIB-GSL推导了不规则图形数据的变分近似,形成了易于处理的IB目标函数,有利于训练的稳定性。大量实验结果表明,VIB-GSL具有优越的有效性和鲁棒性。 摘要:Graph Neural Networks (GNNs) have shown promising results on a broad spectrum of applications. Most empirical studies of GNNs directly take the observed graph as input, assuming the observed structure perfectly depicts the accurate and complete relations between nodes. However, graphs in the real world are inevitably noisy or incomplete, which could even exacerbate the quality of graph representations. In this work, we propose a novel Variational Information Bottleneck guided Graph Structure Learning framework, namely VIB-GSL, in the perspective of information theory. VIB-GSL advances the Information Bottleneck (IB) principle for graph structure learning, providing a more elegant and universal framework for mining underlying task-relevant relations. VIB-GSL learns an informative and compressive graph structure to distill the actionable information for specific downstream tasks. VIB-GSL deduces a variational approximation for irregular graph data to form a tractable IB objective function, which facilitates training stability. Extensive experimental results demonstrate that the superior effectiveness and robustness of VIB-GSL.

【22】 Adapting Document-Grounded Dialog Systems to Spoken Conversations using Data Augmentation and a Noisy Channel Model 标题:使用数据增强和噪声信道模型使基于文档的对话系统适应口语对话 链接:https://arxiv.org/abs/2112.08844

作者:David Thulke,Nico Daheim,Christian Dugast,Hermann Ney 备注:Accepted to the DSTC10 workshop at AAAI 2022 摘要:本文总结了我们对第十届对话系统技术挑战(DSTC10)“基于知识的面向任务的口语对话建模”第二轨道任务2的提交。与前一年的迭代类似,该任务由三个子任务组成:检测一个回合是否是知识寻求,选择相关的知识文档,最后生成扎根的响应。今年,重点在于使系统适应嘈杂的ASR成绩单。我们探索了不同的方法,使模型对这种类型的输入更加健壮,并使生成的响应适应口语对话的风格。对于后者,我们使用噪声信道模型获得最佳结果,该模型还减少了短响应和一般响应的数量。我们最好的系统在挑战的自动评估中排名第一,在人类评估中排名第三。 摘要:This paper summarizes our submission to Task 2 of the second track of the 10th Dialog System Technology Challenge (DSTC10) "Knowledge-grounded Task-oriented Dialogue Modeling on Spoken Conversations". Similar to the previous year's iteration, the task consists of three subtasks: detecting whether a turn is knowledge seeking, selecting the relevant knowledge document and finally generating a grounded response. This year, the focus lies on adapting the system to noisy ASR transcripts. We explore different approaches to make the models more robust to this type of input and to adapt the generated responses to the style of spoken conversations. For the latter, we get the best results with a noisy channel model that additionally reduces the number of short and generic responses. Our best system achieved the 1st rank in the automatic and the 3rd rank in the human evaluation of the challenge.

【23】 Bridging between Cognitive Processing Signals and Linguistic Features via a Unified Attentional Network 标题:通过统一注意网络在认知加工信号和语言特征之间架起桥梁 链接:https://arxiv.org/abs/2112.08831

作者:Yuqi Ren,Deyi Xiong 摘要:认知加工信号可以用来改善自然语言处理(NLP)任务。然而,这些信号如何与语言信息相关尚不清楚。人类语言处理和语言特征之间的桥梁在神经语言学中得到了广泛的研究,通常是通过高控制刺激的单变量控制实验。这种方法不仅损害了自然阅读的真实性,而且耗时且昂贵。在本文中,我们提出了一种数据驱动的方法来研究认知加工信号与语言特征之间的关系。具体来说,我们提出了一个统一的注意框架,该框架由嵌入层、注意层、编码层和预测层组成,以选择性地将认知加工信号映射到语言特征。我们将映射过程定义为桥接任务,并针对词汇、句法和语义特征开发了12个桥接任务。该框架只需要记录在自然阅读下的认知加工信号作为输入,并且可以使用单个认知数据集检测广泛的语言特征。实验结果的观察结果与先前的神经科学发现一致。除此之外,我们的实验还揭示了一些有趣的发现,例如上下文眼动特征和句子时态之间的相关性。 摘要:Cognitive processing signals can be used to improve natural language processing (NLP) tasks. However, it is not clear how these signals correlate with linguistic information. Bridging between human language processing and linguistic features has been widely studied in neurolinguistics, usually via single-variable controlled experiments with highly-controlled stimuli. Such methods not only compromises the authenticity of natural reading, but also are time-consuming and expensive. In this paper, we propose a data-driven method to investigate the relationship between cognitive processing signals and linguistic features. Specifically, we present a unified attentional framework that is composed of embedding, attention, encoding and predicting layers to selectively map cognitive processing signals to linguistic features. We define the mapping procedure as a bridging task and develop 12 bridging tasks for lexical, syntactic and semantic features. The proposed framework only requires cognitive processing signals recorded under natural reading as inputs, and can be used to detect a wide range of linguistic features with a single cognitive dataset. Observations from experiment results resonate with previous neuroscience findings. In addition to this, our experiments also reveal a number of interesting findings, such as the correlation between contextual eye-tracking features and tense of sentence.

【24】 Graph-wise Common Latent Factor Extraction for Unsupervised Graph Representation Learning 标题:用于无监督图表示学习的图式公共潜在因子提取 链接:https://arxiv.org/abs/2112.08830

作者:Thilini Cooray,Ngai-Man Cheung 备注:Accepted to AAAI 2022 摘要:无监督图级表示学习在分子性质预测和群体分析等各种任务中起着至关重要的作用,尤其是在数据注释费用昂贵的情况下。目前,大多数性能最好的图嵌入方法都是基于Infomax原理的。这些方法的性能在很大程度上取决于阴性样本的选择,如果不仔细选择样本,则会损害性能。如果用于相似性匹配的选定图集质量较低,则基于图间相似性的方法也会受到影响。为了解决这个问题,我们只关注利用当前输入图进行嵌入学习。我们的动机来自于对真实世界图形生成过程的观察,其中图形是基于一个或多个全局因素形成的,这些全局因素对图形的所有元素都是通用的(例如,讨论主题、分子的溶解度水平)。我们假设提取这些共同因素可能非常有益。因此,本文提出了一种新的无监督图表示学习原理:图态公共潜在因子提取(GCFX)。我们进一步提出了一个GCFX的深层模型deepGCFX,该模型基于逆转上述图形生成过程的思想,该过程可以明确地从输入图形中提取常见的潜在因素,并在下游任务上达到目前最先进的水平。通过大量的实验和分析,我们证明,虽然提取公共潜在因素有助于图形级任务减轻因单个节点或局部邻域的局部变化而引起的分心,但它也有助于节点级任务实现远程节点依赖,特别是对于非分解图。 摘要:Unsupervised graph-level representation learning plays a crucial role in a variety of tasks such as molecular property prediction and community analysis, especially when data annotation is expensive. Currently, most of the best-performing graph embedding methods are based on Infomax principle. The performance of these methods highly depends on the selection of negative samples and hurt the performance, if the samples were not carefully selected. Inter-graph similarity-based methods also suffer if the selected set of graphs for similarity matching is low in quality. To address this, we focus only on utilizing the current input graph for embedding learning. We are motivated by an observation from real-world graph generation processes where the graphs are formed based on one or more global factors which are common to all elements of the graph (e.g., topic of a discussion thread, solubility level of a molecule). We hypothesize extracting these common factors could be highly beneficial. Hence, this work proposes a new principle for unsupervised graph representation learning: Graph-wise Common latent Factor EXtraction (GCFX). We further propose a deep model for GCFX, deepGCFX, based on the idea of reversing the above-mentioned graph generation process which could explicitly extract common latent factors from an input graph and achieve improved results on downstream tasks to the current state-of-the-art. Through extensive experiments and analysis, we demonstrate that, while extracting common latent factors is beneficial for graph-level tasks to alleviate distractions caused by local variations of individual nodes or local neighbourhoods, it also benefits node-level tasks by enabling long-range node dependencies, especially for disassortative graphs.

【25】 An Unsupervised Way to Understand Artifact Generating Internal Units in Generative Neural Networks 标题:一种无监督理解产生式神经网络中伪迹生成内部单元的方法 链接:https://arxiv.org/abs/2112.08814

作者:Haedong Jeong,Jiyeon Han,Jaesik Choi 备注:AAAI22 accepted paper 摘要:尽管生成性对抗网络(GAN)的图像生成性能有了显著改善,但仍观察到低视觉保真度的生成。由于广泛使用的GAN指标更多地关注模型的整体性能,因此对单个世代的质量评估或缺陷世代的检测具有挑战性。虽然最近的研究试图检测导致伪影的featuremap单元并评估单个样本,但这些方法需要额外的资源,如外部网络或大量训练数据来近似真实的数据流形。在这项工作中,我们提出了局部激活的概念,并设计了一个关于局部激活的度量来检测工件的生成,而无需额外的监督。我们的经验证明,我们的方法可以检测和纠正来自具有各种数据集的GAN的工件生成。最后,我们讨论了几何分析,以部分揭示所提出的概念和低视觉保真度之间的关系。 摘要:Despite significant improvements on the image generation performance of Generative Adversarial Networks (GANs), generations with low visual fidelity still have been observed. As widely used metrics for GANs focus more on the overall performance of the model, evaluation on the quality of individual generations or detection of defective generations is challenging. While recent studies try to detect featuremap units that cause artifacts and evaluate individual samples, these approaches require additional resources such as external networks or a number of training data to approximate the real data manifold. In this work, we propose the concept of local activation, and devise a metric on the local activation to detect artifact generations without additional supervision. We empirically verify that our approach can detect and correct artifact generations from GANs with various datasets. Finally, we discuss a geometrical analysis to partially reveal the relation between the proposed concept and low visual fidelity.

【26】 Saliency Grafting: Innocuous Attribution-Guided Mixup with Calibrated Label Mixing 标题:显著嫁接:无伤大雅的归因导向混合和校准标签混合 链接:https://arxiv.org/abs/2112.08796

作者:Joonhyung Park,June Yong Yang,Jinwoo Shin,Sung Ju Hwang,Eunho Yang 备注:12 pages; Accepted to AAAI2022 摘要:混合方案建议混合一对样本来创建一个增强的训练样本,并且最近为了提高神经网络的可推广性而受到了相当大的关注。混搭的一个简单且广泛使用的扩展是与类似区域退出的方法相结合:从一个样本中移除随机补丁,并用另一个样本中的特征替换它。尽管这些方法简单有效,但由于其随机性,容易产生有害样本。为了解决这个问题,最近提出了“最大显著性”策略:它们只选择信息量最大的特征来防止这种现象。然而,他们现在缺乏样本多样化,因为他们总是决定性地选择具有最大显著性的区域,将偏差注入到增强的数据中。在本文中,我们提出了一种新颖而简单的混音变体,它抓住了这两个世界的优点。我们的想法是双重的。通过对特征进行随机采样并将其“嫁接”到另一个样本上,我们的方法有效地生成多样但有意义的样本。其第二个要素是通过以显著性校准方式混合标签来生成嫁接样本的标签,从而纠正随机抽样程序引入的监督误导。我们在CIFAR、Tiny ImageNet和ImageNet数据集下的实验表明,我们的方案不仅在分类精度方面优于当前最先进的增强策略,而且在应对数据损坏和对象遮挡等压力条件方面也优于现有的增强策略。 摘要:The Mixup scheme suggests mixing a pair of samples to create an augmented training sample and has gained considerable attention recently for improving the generalizability of neural networks. A straightforward and widely used extension of Mixup is to combine with regional dropout-like methods: removing random patches from a sample and replacing it with the features from another sample. Albeit their simplicity and effectiveness, these methods are prone to create harmful samples due to their randomness. To address this issue, 'maximum saliency' strategies were recently proposed: they select only the most informative features to prevent such a phenomenon. However, they now suffer from lack of sample diversification as they always deterministically select regions with maximum saliency, injecting bias into the augmented data. In this paper, we present, a novel, yet simple Mixup-variant that captures the best of both worlds. Our idea is two-fold. By stochastically sampling the features and 'grafting' them onto another sample, our method effectively generates diverse yet meaningful samples. Its second ingredient is to produce the label of the grafted sample by mixing the labels in a saliency-calibrated fashion, which rectifies supervision misguidance introduced by the random sampling procedure. Our experiments under CIFAR, Tiny-ImageNet, and ImageNet datasets show that our scheme outperforms the current state-of-the-art augmentation strategies not only in terms of classification accuracy, but is also superior in coping under stress conditions such as data corruption and object occlusion.

【27】 Utilizing Evidence Spans via Sequence-Level Contrastive Learning for Long-Context Question Answering 标题:基于序列级对比学习的证据跨度在长上下文问答中的应用 链接:https://arxiv.org/abs/2112.08777

作者:Avi Caciularu,Ido Dagan,Jacob Goldberger,Arman Cohan 摘要:远程Transformer模型在长上下文问答(QA)任务中取得了令人鼓舞的结果。这类任务通常需要对一个长文档进行推理,并且它们从识别一组证据跨度(例如句子)中获益,这些证据跨度为解决问题提供了支持性证据。在这项工作中,我们提出了一种新的方法,为远程Transformer配备额外的序列级目标,以便更好地识别支持证据跨度。我们通过在微调中提出一个额外的对比监督信号来实现这一点,鼓励该模型通过最大化问题-证据相似度来明确区分支持证据句和否定证据句。拟议的额外损失在三种不同的强长上下文转换器模型上表现出一致的改进,跨越了两个具有挑战性的问答基准——HotpotQA和QAsper。 摘要:Long-range transformer models have achieved encouraging results on long-context question answering (QA) tasks. Such tasks often require reasoning over a long document, and they benefit from identifying a set of evidence spans (e.g., sentences) that provide supporting evidence for addressing the question. In this work, we propose a novel method for equipping long-range transformers with an additional sequence-level objective for better identification of supporting evidence spans. We achieve this by proposing an additional contrastive supervision signal in finetuning, where the model is encouraged to explicitly discriminate supporting evidence sentences from negative ones by maximizing the question-evidence similarity. The proposed additional loss exhibits consistent improvements on three different strong long-context transformer models, across two challenging question answering benchmarks - HotpotQA and QAsper.

【28】 Unsupervised Matching of Data and Text 标题:数据和文本的无监督匹配 链接:https://arxiv.org/abs/2112.08776

作者:Naser Ahmadi,Hansjorg Sand,Paolo Papotti 备注:Accepted at IEEE ICDE 2022 Code at this https URL 摘要:实体解析是一个被广泛研究的问题,有几个建议可以在关系中匹配记录。匹配文本内容在许多应用程序中是一项广泛的任务,例如问答和搜索。虽然最近的方法在这两项任务中取得了令人满意的结果,但对于匹配文本内容和结构化数据这一更普遍的问题,还没有明确的解决方案。我们引入了一个框架,该框架在无监督的环境下为任何一对语料库(关系表或文本文档)支持这项新任务。我们的方法在语料库的内容上构建一个细粒度的图,并派生单词嵌入来表示低维空间中要匹配的对象。学习到的表示能够在不同的粒度上实现有效的匹配,从关系元组到文本句子和段落。我们的灵活框架可以利用预先训练过的资源,但它不依赖于它们的存在,并且在词汇表特定于领域时,在匹配内容方面实现了更好的质量性能。我们还使用“扩展和压缩”方法在图形创建过程中引入优化,该方法首先确定跨元素的新有效关系,以改进匹配,然后修剪节点和边,以减小图形大小。在真实用例和公共数据集上的实验表明,我们的框架产生的嵌入在结果质量和执行时间上都优于单词嵌入和微调语言模型。 摘要:Entity resolution is a widely studied problem with several proposals to match records across relations. Matching textual content is a widespread task in many applications, such as question answering and search. While recent methods achieve promising results for these two tasks, there is no clear solution for the more general problem of matching textual content and structured data. We introduce a framework that supports this new task in an unsupervised setting for any pair of corpora, being relational tables or text documents. Our method builds a fine-grained graph over the content of the corpora and derives word embeddings to represent the objects to match in a low dimensional space. The learned representation enables effective and efficient matching at different granularity, from relational tuples to text sentences and paragraphs. Our flexible framework can exploit pre-trained resources, but it does not depends on their existence and achieves better quality performance in matching content when the vocabulary is domain specific. We also introduce optimizations in the graph creation process with an "expand and compress" approach that first identifies new valid relationships across elements, to improve matching, and then prunes nodes and edges, to reduce the graph size. Experiments on real use cases and public datasets show that our framework produces embeddings that outperform word embeddings and fine-tuned language models both in results' quality and in execution times.

【29】 CODER: An efficient framework for improving retrieval through COntextualized Document Embedding Reranking 标题:编码器:一种通过上下文文档嵌入重排序改进检索的有效框架 链接:https://arxiv.org/abs/2112.08766

作者:George Zerveas,Navid Rekabsaz,Daniel Cohen,Carsten Eickhoff 摘要:我们提出了一个框架,用于以最小的计算成本提高一大类检索模型的性能。它利用由基本密集检索方法提取的预计算文档表示,并涉及训练一个模型,以便为每个查询联合评分一大组检索到的候选文档,同时可能在其他候选上下文以及查询本身中动态转换每个文档的表示。当根据文档表示与查询的相似性对文档表示进行评分时,该模型因此知道其“对等”文档的表示。我们表明,与基本方法相比,我们的方法在检索性能上有了实质性的改进,并且在相互隔离的情况下对候选文档进行了评分,就像在成对的训练环境中一样。至关重要的是,与基于类BERT编码器的术语交互重排器不同,它在运行时在任何第一阶段方法的基础上产生的计算开销可以忽略不计,因此可以轻松地与任何最先进的密集检索方法相结合。最后,同时考虑给定查询的一组候选文档可以在检索中实现额外的有价值的功能,例如分数校准和减轻排序中的社会偏见。 摘要:We present a framework for improving the performance of a wide class of retrieval models at minimal computational cost. It utilizes precomputed document representations extracted by a base dense retrieval method and involves training a model to jointly score a large set of retrieved candidate documents for each query, while potentially transforming on the fly the representation of each document in the context of the other candidates as well as the query itself. When scoring a document representation based on its similarity to a query, the model is thus aware of the representation of its "peer" documents. We show that our approach leads to substantial improvement in retrieval performance over the base method and over scoring candidate documents in isolation from one another, as in a pair-wise training setting. Crucially, unlike term-interaction rerankers based on BERT-like encoders, it incurs a negligible computational overhead on top of any first-stage method at run time, allowing it to be easily combined with any state-of-the-art dense retrieval method. Finally, concurrently considering a set of candidate documents for a given query enables additional valuable capabilities in retrieval, such as score calibration and mitigating societal biases in ranking.

【30】 KnAC: an approach for enhancing cluster analysis with background knowledge and explanations 标题:KNAC:一种利用背景知识和解释增强聚类分析的方法 链接:https://arxiv.org/abs/2112.08759

作者:Szymon Bobek,Michał Kuk,Jakub Brzegowski,Edyta Brzychczy,Grzegorz J. Nalepa 备注:Submitted to Applied Intelligence 摘要:自几十年来,多维数据集中的模式发现一直是研究的主题。有许多聚类算法可用于此目的。然而,它们的实际应用在后聚类阶段有着共同点,这涉及到基于专家的解释和对所得结果的分析。我们认为,这可能是过程的瓶颈,尤其是在领域知识存在于集群之前的情况下。这种情况不仅需要对自动发现的集群进行适当的分析,还需要对现有知识进行一致性检查。在这项工作中,我们提出了知识增强聚类(KnAC),其主要目标是将基于专家的标记与自动聚类相结合,以更新和细化前者。我们的解决方案不依赖于任何现成的聚类算法,也不引入任何一种。取而代之的是,KnAC可以作为任意聚类算法的扩充,使该方法具有鲁棒性和模型不可知性。我们在人工、可复制的示例和真实的用例场景中演示了我们的方法的可行性。 摘要:Pattern discovery in multidimensional data sets has been a subject of research since decades. There exists a wide spectrum of clustering algorithms that can be used for that purpose. However, their practical applications share in common the post-clustering phase, which concerns expert-based interpretation and analysis of the obtained results. We argue that this can be a bottleneck of the process, especially in the cases where domain knowledge exists prior to clustering. Such a situation requires not only a proper analysis of automatically discovered clusters, but also a conformance checking with existing knowledge. In this work, we present Knowledge Augmented Clustering (KnAC), which main goal is to confront expert-based labelling with automated clustering for the sake of updating and refining the former. Our solution does not depend on any ready clustering algorithm, nor introduce one. Instead KnAC can serve as an augmentation of an arbitrary clustering algorithm, making the approach robust and model-agnostic. We demonstrate the feasibility of our method on artificially, reproducible examples and on a real life use case scenario.

【31】 Forensic Analysis of Synthetically Generated Scientific Images 标题:合成科学图像的取证分析 链接:https://arxiv.org/abs/2112.08739

作者:Sara Mandelli,Davide Cozzolino,Joao P. Cardenuto,Daniel Moreira,Paolo Bestagini,Walter Scheirer,Anderson Rocha,Luisa Verdoliva,Stefano Tubaro,Edward J. Delp 摘要:合成内容的广泛传播是一个严重的威胁,需要采取紧急对策。合成内容的生成并不局限于视频、照片或音频序列等多媒体数据,而是涵盖了相当广泛的领域,也可以包括生物图像,如westernblot和显微镜图像。在这篇论文中,我们主要研究合成的westernblot图像的检测。生物医学文献中对Western blot图像进行了大量研究,已经证明这些图像很容易伪造,很少有希望通过目视检查或标准法医检测器发现操纵。为了克服缺少公开可用数据集的问题,我们创建了一个新的数据集,该数据集由三种不同的最先进的生成方法生成,包含超过14K的原始western blot图像和18K的合成western blot图像。然后,我们研究了不同的策略来检测合成蛋白质印迹,探索了二元分类方法以及一类检测器。在这两种情况下,我们从未在训练阶段利用合成的westernblot图像。取得的结果表明,合成产生的westernblot图像可以具有良好的准确性,即使开发的检测器没有优化这些科学图像的合成版本。 摘要:The widespread diffusion of synthetically generated content is a serious threat that needs urgent countermeasures. The generation of synthetic content is not restricted to multimedia data like videos, photographs, or audio sequences, but covers a significantly vast area that can include biological images as well, such as western-blot and microscopic images. In this paper, we focus on the detection of synthetically generated western-blot images. Western-blot images are largely explored in the biomedical literature and it has been already shown how these images can be easily counterfeited with few hope to spot manipulations by visual inspection or by standard forensics detectors. To overcome the absence of a publicly available dataset, we create a new dataset comprising more than 14K original western-blot images and 18K synthetic western-blot images, generated by three different state-of-the-art generation methods. Then, we investigate different strategies to detect synthetic western blots, exploring binary classification methods as well as one-class detectors. In both scenarios, we never exploit synthetic western-blot images at training stage. The achieved results show that synthetically generated western-blot images can be spot with good accuracy, even though the exploited detectors are not optimized over synthetic versions of these scientific images.

【32】 Learning to Minimize Cost-to-Serve for Multi-Node Multi-Product Order Fulfilment in Electronic Commerce 标题:电子商务中多节点多产品订单执行的服务成本最小化研究 链接:https://arxiv.org/abs/2112.08736

作者:Pranavi Pathakota,Kunwar Zaid,Anulekha Dhara,Hardik Meisheri,Shaun D Souza,Dheeraj Shah,Harshad Khadilkar 摘要:我们描述了一个新的决策问题的发展响应的需求,零售电子商务(电子商务)。在与物流和零售业业务合作伙伴合作时,我们发现从供应链中最合适的节点交付产品的成本(称为服务成本或CTS的数量)是一个关键挑战。电子商务供应链的大规模、高度随机性和巨大的地理分布使得此设置非常适合精心设计的数据驱动决策算法。在这项前期工作中,我们关注于在每个时间段内从任何仓库向多个客户交付任意数量的多个产品的特定子问题。我们比较了几种基线的相对性能和计算效率,包括启发式和混合整数线性规划。我们证明了基于强化学习的算法与这些策略是有竞争力的,在现实世界中具有有效扩展的潜力。 摘要:We describe a novel decision-making problem developed in response to the demands of retail electronic commerce (e-commerce). While working with logistics and retail industry business collaborators, we found that the cost of delivery of products from the most opportune node in the supply chain (a quantity called the cost-to-serve or CTS) is a key challenge. The large scale, high stochasticity, and large geographical spread of e-commerce supply chains make this setting ideal for a carefully designed data-driven decision-making algorithm. In this preliminary work, we focus on the specific subproblem of delivering multiple products in arbitrary quantities from any warehouse to multiple customers in each time period. We compare the relative performance and computational efficiency of several baselines, including heuristics and mixed-integer linear programming. We show that a reinforcement learning based algorithm is competitive with these policies, with the potential of efficient scale-up in the real world.

【33】 Pay More Attention to History: A Context Modeling Strategy for Conversational Text-to-SQL 标题:关注历史:一种对话式Text-to-SQL的上下文建模策略 链接:https://arxiv.org/abs/2112.08735

作者:Yuntao Li,Hanchu Zhang,Yutian Li,Sirui Wang,Wei Wu,Yan Zhang 摘要:会话文本到SQL旨在将多回合自然语言查询转换为相应的SQL表示。将文本转换为SQL最棘手的问题之一是对多轮查询的语义进行建模,并收集当前查询所需的适当信息。本文表明,通过添加每个回合和对整个上下文的摘要来显式地建模语义变化,可以在将会话查询转换为SQL方面带来更好的性能。特别地,我们提出了两个会话建模任务,分别是转折粒度和会话粒度。这两个任务只是作为辅助训练任务来帮助进行多回合会话语义分析。我们对大规模开放域会话文本到SQL数据集进行了实证研究,并取得了最新成果。结果表明,该机制显著提高了多轮语义分析的性能。 摘要:Conversational text-to-SQL aims at converting multi-turn natural language queries into their corresponding SQL representations. One of the most intractable problem of conversational text-to-SQL is modeling the semantics of multi-turn queries and gathering proper information required for the current query. This paper shows that explicit modeling the semantic changes by adding each turn and the summarization of the whole context can bring better performance on converting conversational queries into SQLs. In particular, we propose two conversational modeling tasks in both turn grain and conversation grain. These two tasks simply work as auxiliary training tasks to help with multi-turn conversational semantic parsing. We conducted empirical studies and achieve new state-of-the-art results on large-scale open-domain conversational text-to-SQL dataset. The results demonstrate that the proposed mechanism significantly improves the performance of multi-turn semantic parsing.

【34】 Self-Supervised Dynamic Graph Representation Learning via Temporal Subgraph Contrast 标题:基于时态子图对比的自监督动态图表示学习 链接:https://arxiv.org/abs/2112.08733

作者:Linpu Jiang,Ke-Jia Chen,Jingqiang Chen 摘要:图上的自监督学习由于其独立于标签和在表示上的鲁棒性,近年来受到了广泛的关注。目前对该主题的研究主要使用静态信息,如图结构,但不能很好地捕获动态信息,如边的时间戳。现实图形通常是动态的,这意味着节点之间的交互发生在特定的时间。提出了一种自监督动态图表示学习框架(DySubC),该框架定义了一个时态子图对比学习任务来同时学习动态图的结构和演化特征。具体而言,本文首先提出了一种新的时态子图采样策略,该策略以动态图的每个节点为中心节点,利用邻域结构和边缘时间戳对相应的时态子图进行采样。对每个子图中的节点进行编码后,根据邻域节点对中心节点的影响设计子图表示函数。最后,定义了结构和时间对比损失,以最大化节点表示和时间子图表示之间的互信息。在五个真实数据集上的实验表明:(1)DySubC在下游链路预测任务中的表现优于相关基线,包括两个图形对比学习模型和四个动态图形表示学习模型;(2)时间信息的使用不仅可以采样更有效的子图,但也要通过时间对比损失学习更好的表征。 摘要:Self-supervised learning on graphs has recently drawn a lot of attention due to its independence from labels and its robustness in representation. Current studies on this topic mainly use static information such as graph structures but cannot well capture dynamic information such as timestamps of edges. Realistic graphs are often dynamic, which means the interaction between nodes occurs at a specific time. This paper proposes a self-supervised dynamic graph representation learning framework (DySubC), which defines a temporal subgraph contrastive learning task to simultaneously learn the structural and evolutional features of a dynamic graph. Specifically, a novel temporal subgraph sampling strategy is firstly proposed, which takes each node of the dynamic graph as the central node and uses both neighborhood structures and edge timestamps to sample the corresponding temporal subgraph. The subgraph representation function is then designed according to the influence of neighborhood nodes on the central node after encoding the nodes in each subgraph. Finally, the structural and temporal contrastive loss are defined to maximize the mutual information between node representation and temporal subgraph representation. Experiments on five real-world datasets demonstrate that (1) DySubC performs better than the related baselines including two graph contrastive learning models and four dynamic graph representation learning models in the downstream link prediction task, and (2) the use of temporal information can not only sample more effective subgraphs, but also learn better representation by temporal contrastive loss.

【35】 GIMIRec: Global Interaction Information Aware Multi-Interest Framework for Sequential Recommendation 标题:GIMIRec:面向顺序推荐的全局交互信息感知多兴趣框架 链接:https://arxiv.org/abs/2112.08717

作者:Jie Zhang,Ke-Jia Chen,Jingqiang Chen 摘要:基于多兴趣框架的顺序推荐将用户最近的交互序列建模为多个不同的兴趣向量,因为单个低维向量不能完全代表用户兴趣的多样性。然而,现有的大多数模型只截取用户最近的交互行为作为训练数据,丢弃了大量的历史交互序列。这可能会引起两个问题。一方面,反映用户多重兴趣的数据缺失;另一方面,在历史用户项交互中,项之间的共存没有得到充分的探讨。为了解决这两个问题,本文提出了一种新的顺序推荐模型,称为“全局交互感知多兴趣顺序推荐框架(GIMIRec)”。具体来说,首先提出了一种全局上下文提取模块,该模块不引入任何外部信息,它根据所有用户的历史交互序列中每个项目对的约束共现数及其时间间隔计算加权共现矩阵,然后使用简化的图卷积得到每个项目的全局上下文嵌入。其次,捕获每个用户最近交互序列中每个项目对的时间间隔,并与全局上下文项目嵌入相结合,得到个性化的项目嵌入。最后,应用基于自我注意的多兴趣框架学习用户的不同兴趣,进行顺序推荐。在Amazon Books、淘宝Buy和Amazon Hybrid三个真实数据集上进行的大量实验表明,GIMIRec在召回率、NDCG和命中率指标上的性能明显优于最先进的方法。此外,所提出的全局上下文提取模块可以方便地移植到大多数顺序推荐模型中。 摘要:Sequential recommendation based on multi-interest framework models the user's recent interaction sequence into multiple different interest vectors, since a single low-dimensional vector cannot fully represent the diversity of user interests. However, most existing models only intercept users' recent interaction behaviors as training data, discarding a large amount of historical interaction sequences. This may raise two issues. On the one hand, data reflecting multiple interests of users is missing; on the other hand, the co-occurrence between items in historical user-item interactions is not fully explored. To tackle the two issues, this paper proposes a novel sequential recommendation model called "Global Interaction Aware Multi-Interest Framework for Sequential Recommendation (GIMIRec)". Specifically, a global context extraction module is firstly proposed without introducing any external information, which calculates a weighted co-occurrence matrix based on the constrained co-occurrence number of each item pair and their time interval from the historical interaction sequences of all users and then obtains the global context embedding of each item by using a simplified graph convolution. Secondly, the time interval of each item pair in the recent interaction sequence of each user is captured and combined with the global context item embedding to get the personalized item embedding. Finally, a self-attention based multi-interest framework is applied to learn the diverse interests of users for sequential recommendation. Extensive experiments on the three real-world datasets of Amazon-Books, Taobao-Buy and Amazon-Hybrid show that the performance of GIMIRec on the Recall, NDCG and Hit Rate indicators is significantly superior to that of the state-of-the-art methods. Moreover, the proposed global context extraction module can be easily transplanted to most sequential recommendation models.

【36】 META: Mimicking Embedding via oThers' Aggregation for Generalizable Person Re-identification 标题:Meta:通过其他人的聚集模仿嵌入以实现可泛化的人重新识别 链接:https://arxiv.org/abs/2112.08684

作者:Boqiang Xu,Jian Liang,Lingxiao He,Zhenan Sun 摘要:域概括(DG)人员再识别(ReID)旨在在训练时不访问目标域数据的情况下跨未知域进行测试,这是一个现实但具有挑战性的问题。与为不同领域假设相同模型的方法不同,混合专家(MoE)利用多个领域特定的网络来利用领域之间的互补信息,获得了令人印象深刻的结果。然而,现有的基于MoE的DG-ReID方法随着源域数目的增加,模型尺寸越来越大,并且大多数方法忽略了域不变特性的利用。为了解决上述两个问题,本文提出了一种新的DG-ReID方法,称为通过他人聚合(META)模拟嵌入。为了避免较大的模型尺寸,META专家不为每个源域添加分支网络,而是共享除批处理规范化层之外的所有参数。除了多个专家之外,META还利用实例规范化(IN)并将其引入到全局分支中,以追求跨域的不变特性。同时,META通过标准化统计考虑未知目标样本和源域的相关性,并开发了一个聚合网络来自适应地集成多个专家来模拟未知目标域。得益于提出的一致性损失和一种幕式训练算法,我们可以期望元模拟嵌入一个真正看不见的目标域。大量的实验证明,META大大超过了最先进的DG ReID方法。 摘要:Domain generalizable (DG) person re-identification (ReID) aims to test across unseen domains without access to the target domain data at training time, which is a realistic but challenging problem. In contrast to methods assuming an identical model for different domains, Mixture of Experts (MoE) exploits multiple domain-specific networks for leveraging complementary information between domains, obtaining impressive results. However, prior MoE-based DG ReID methods suffer from a large model size with the increase of the number of source domains, and most of them overlook the exploitation of domain-invariant characteristics. To handle the two issues above, this paper presents a new approach called Mimicking Embedding via oThers' Aggregation (META) for DG ReID. To avoid the large model size, experts in META do not add a branch network for each source domain but share all the parameters except for the batch normalization layers. Besides multiple experts, META leverages Instance Normalization (IN) and introduces it into a global branch to pursue invariant features across domains. Meanwhile, META considers the relevance of an unseen target sample and source domains via normalization statistics and develops an aggregation network to adaptively integrate multiple experts for mimicking unseen target domain. Benefiting from a proposed consistency loss and an episodic training algorithm, we can expect META to mimic embedding for a truly unseen target domain. Extensive experiments verify that META surpasses state-of-the-art DG ReID methods by a large margin.

【37】 DREAM: Uncovering Mental Models behind Language Models 标题:梦想:揭开语言模型背后的心理模型 链接:https://arxiv.org/abs/2112.08656

作者:Yuling Gu,Bhavana Dalvi Mishra,Peter Clark 摘要:在回答情境问题(例如,关于特定道德困境的问题)时,语言模型(LMs)在多大程度上构建了场景的“心理模型”?虽然认知科学已经表明,心智模型在人类解决问题中起着基础性作用,但目前尚不清楚现有LMs的高问答性能是否有类似的模型构建支持——如果没有,这是否可以解释他们众所周知的灾难性失败。我们观察到,Macaw是一种现有的基于T5的LM,当被探究时,它为情境问题提供了一些有用但不充分的心智模型(估计准确率=43%,有用性=21%,一致性=42%)。我们提出了DREAM模型,该模型将情境问题作为输入,生成一个描述情境的心智模型,而不需要任何额外的特定于任务的心智模型训练数据。它通过对现有NLP资源的远程监督来继承其社会常识。我们的分析表明,与金刚鹦鹉相比,梦可以产生显著更好的心理模型(估计准确率为67%,有用性为37%,一致性为71%)。最后,由梦生成的心理模型可以作为情境QA任务的附加上下文。在三个不同的数据集上,这种额外的上下文将金刚鹦鹉Zero-Shot模型的答案准确性提高了+1%到+4%(绝对值)。 摘要:To what extent do language models (LMs) build "mental models" of a scene when answering situated questions (e.g., questions about a specific ethical dilemma)? While cognitive science has shown that mental models play a fundamental role in human problem-solving, it is unclear whether the high question-answering performance of existing LMs is backed by similar model building - and if not, whether that can explain their well-known catastrophic failures. We observed that Macaw, an existing T5-based LM, when probed provides somewhat useful but inadequate mental models for situational questions (estimated accuracy=43%, usefulness=21%, consistency=42%). We propose DREAM, a model that takes a situational question as input to produce a mental model elaborating the situation, without any additional task specific training data for mental models. It inherits its social commonsense through distant supervision from existing NLP resources. Our analysis shows that DREAM can produce significantly better mental models (estimated accuracy=67%, usefulness=37%, consistency=71%) compared to Macaw. Finally, mental models generated by DREAM can be used as additional context for situational QA tasks. This additional context improves the answer accuracy of a Macaw zero-shot model by between +1% and +4% (absolute) on three different datasets.

【38】 Learning Interpretable Models Through Multi-Objective Neural Architecture Search 标题:基于多目标神经结构搜索的可解释模型学习 链接:https://arxiv.org/abs/2112.08645

作者:Zachariah Carmichael,Tim Moon,Sam Ade Jacobs 摘要:深度学习的巨大进步在许多领域都带来了前所未有的成就。虽然深度神经网络的性能是无可置疑的,但这种模型的结构设计和可解释性并不重要。通过神经体系结构搜索(NAS)实现神经网络体系结构设计自动化的研究已经开始。最近的进展通过利用分布式计算和新的优化算法使这些方法更加实用。然而,在优化体系结构以实现可解释性方面几乎没有工作。为此,我们提出了一个多目标分布式NAS框架,该框架优化了任务性能和内省。我们利用非支配排序遗传算法(NSGA-II)和可解释人工智能(XAI)技术来奖励人类能够更好理解的体系结构。该框架在多个图像分类数据集上进行了评估。我们证明,对内省能力和任务错误进行联合优化会导致在可容忍的错误范围内执行更为分散的体系结构。 摘要:Monumental advances in deep learning have led to unprecedented achievements across a multitude of domains. While the performance of deep neural networks is indubitable, the architectural design and interpretability of such models are nontrivial. Research has been introduced to automate the design of neural network architectures through neural architecture search (NAS). Recent progress has made these methods more pragmatic by exploiting distributed computation and novel optimization algorithms. However, there is little work in optimizing architectures for interpretability. To this end, we propose a multi-objective distributed NAS framework that optimizes for both task performance and introspection. We leverage the non-dominated sorting genetic algorithm (NSGA-II) and explainable AI (XAI) techniques to reward architectures that can be better comprehended by humans. The framework is evaluated on several image classification datasets. We demonstrate that jointly optimizing for introspection ability and task error leads to more disentangled architectures that perform within tolerable error.

【39】 TransZero++: Cross Attribute-Guided Transformer for Zero-Shot Learning 标题:TransZero++:用于零距离学习的交叉属性引导转换器 链接:https://arxiv.org/abs/2112.08643

作者:Shiming Chen,Ziming Hong,Guo-Sen Xie,Jian Zhao,Xinge You,Shuicheng Yan,Ling Shao 备注:This is an extention of AAAI'22 paper (TransZero). Submitted to TPAMI. arXiv admin note: substantial text overlap with arXiv:2112.01683 摘要:Zero-Shot学习(Zero-shot learning,ZSL)通过将语义知识从可见类转移到不可见类来解决新的类识别问题。现有的基于注意的模型仅利用单向注意来学习单个图像中的次区域特征,忽略了视觉特征的可转移性和区分性属性定位。在本文中,我们提出了一种称为TransZero++的跨属性引导Transformer网络,用于细化视觉特征和学习ZSL中语义增强视觉嵌入表示的准确属性定位。TransZero++由属性$ ightarrow$可视转换器子网(AVT)和属性$ ightarrow$可视转换器子网(VAT)组成。具体地说,AVT首先采用特征增强编码器来缓解交叉数据集问题,并通过减少区域特征之间纠缠的相对几何关系来提高视觉特征的可转移性。然后,使用属性$ ightarrow$视觉解码器定位与给定图像中每个属性最相关的图像区域,以实现基于属性的视觉特征表示。类似地,VAT使用类似的特征增强编码器来细化视觉特征,这些特征进一步应用于visual$ ightarrow$属性解码器,以学习基于视觉的属性特征。通过进一步引入语义协作损失,两个属性引导的转换器通过语义协作学习相互学习语义增强的视觉嵌入。大量的实验表明,TransZero++在三个具有挑战性的ZSL基准上获得了最新的结果。代码可从以下网址获得:url{https://github.com/shiming-chen/TransZero_pp}. 摘要:Zero-shot learning (ZSL) tackles the novel class recognition problem by transferring semantic knowledge from seen classes to unseen ones. Existing attention-based models have struggled to learn inferior region features in a single image by solely using unidirectional attention, which ignore the transferability and discriminative attribute localization of visual features. In this paper, we propose a cross attribute-guided Transformer network, termed TransZero++, to refine visual features and learn accurate attribute localization for semantic-augmented visual embedding representations in ZSL. TransZero++ consists of an attribute$ ightarrow$visual Transformer sub-net (AVT) and a visual$ ightarrow$attribute Transformer sub-net (VAT). Specifically, AVT first takes a feature augmentation encoder to alleviate the cross-dataset problem, and improves the transferability of visual features by reducing the entangled relative geometry relationships among region features. Then, an attribute$ ightarrow$visual decoder is employed to localize the image regions most relevant to each attribute in a given image for attribute-based visual feature representations. Analogously, VAT uses the similar feature augmentation encoder to refine the visual features, which are further applied in visual$ ightarrow$attribute decoder to learn visual-based attribute features. By further introducing semantical collaborative losses, the two attribute-guided transformers teach each other to learn semantic-augmented visual embeddings via semantical collaborative learning. Extensive experiments show that TransZero++ achieves the new state-of-the-art results on three challenging ZSL benchmarks. The codes are available at: url{https://github.com/shiming-chen/TransZero_pp}.

【40】 Analyzing the Limits of Self-Supervision in Handling Bias in Language 标题:浅析自我监督在处理语言偏见中的局限性 链接:https://arxiv.org/abs/2112.08637

作者:Lisa Bauer,Karthik Gopalakrishnan,Spandana Gella,Yang Liu,Mohit Bansal,Dilek Hakkani-Tur 备注:16 pages, 1 figure 摘要:使用自然语言任务描述来提示输入已经成为一种流行的机制,可以从大规模生成性语言模型中获得合理准确的输出,而几乎没有上下文监督。这也有助于深入了解语言模型如何从大量未标记文本语料库的自我监督预训练中捕捉广泛下游任务的语义。这样的模特自然也会接触到很多不受欢迎的内容,比如种族主义和性别歧视的语言,而在这些方面对模特的认识工作也很有限。在本文中,我们定义并全面评估了这类语言模型在多大程度上捕获了四项偏向任务的语义:诊断、识别、提取和改写。我们为这些任务定义了三大类的任务描述:陈述、问题和完成,每个类中有许多词汇变体。我们研究了使用这些类和空任务描述,通过几种解码方法和几个镜头示例,对每个任务进行提示的有效性。我们的分析表明,语言模型能够在不同的偏见维度(如性别和政治归属)上执行这些任务,程度差异很大。我们相信,我们的工作是朝着无偏见的语言模式迈出的重要一步,通过量化当前自我监督目标在完成这些具有社会挑战性的任务方面的局限性。 摘要:Prompting inputs with natural language task descriptions has emerged as a popular mechanism to elicit reasonably accurate outputs from large-scale generative language models with little to no in-context supervision. This also helps gain insight into how well language models capture the semantics of a wide range of downstream tasks purely from self-supervised pre-training on massive corpora of unlabeled text. Such models have naturally also been exposed to a lot of undesirable content like racist and sexist language and there is limited work on awareness of models along these dimensions. In this paper, we define and comprehensively evaluate how well such language models capture the semantics of four tasks for bias: diagnosis, identification, extraction and rephrasing. We define three broad classes of task descriptions for these tasks: statement, question, and completion, with numerous lexical variants within each class. We study the efficacy of prompting for each task using these classes and the null task description across several decoding methods and few-shot examples. Our analyses indicate that language models are capable of performing these tasks to widely varying degrees across different bias dimensions, such as gender and political affiliation. We believe our work is an important step towards unbiased language models by quantifying the limits of current self-supervision objectives at accomplishing such sociologically challenging tasks.

【41】 Call for Customized Conversation: Customized Conversation Grounding Persona and Knowledge 标题:呼吁定制对话:定制对话基础人物和知识 链接:https://arxiv.org/abs/2112.08619

作者:Yoonna Jang,Jungwoo Lim,Yuna Hur,Dongsuk Oh,Suhyune Son,Yeonsoo Lee,Donghoon Shin,Seungryong Kim,Heuiseok Lim 备注:Accepted paper at the Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-22) 摘要:人类通常通过利用话题的先验知识和与之交谈的人的背景信息进行对话。然而,现有的会话代理和数据集不考虑这样的综合信息,因此它们在生成正确地融合知识和人物的话语方面具有局限性。为了解决这个问题,我们引入了一个调用定制对话(FoCus)数据集,其中定制的答案是使用用户的角色和维基百科知识构建的。为了评估预先训练的语言模型的信息和定制表达能力,我们使用了BART和GPT-2以及基于转换器的模型。我们通过自动评分评估它们的生成能力,并对定性结果进行人工评估。我们通过我们提出的两个子任务,即角色基础(PG)和知识基础(KG),检验模型是否反映了足够的角色和知识。此外,我们还通过基础质量评估表明,我们的数据的话语是由适当的知识和人物角色构成的。 摘要:Humans usually have conversations by making use of prior knowledge about a topic and background information of the people whom they are talking to. However, existing conversational agents and datasets do not consider such comprehensive information, and thus they have a limitation in generating the utterances where the knowledge and persona are fused properly. To address this issue, we introduce a call For Customized conversation (FoCus) dataset where the customized answers are built with the user's persona and Wikipedia knowledge. To evaluate the abilities to make informative and customized utterances of pre-trained language models, we utilize BART and GPT-2 as well as transformer-based models. We assess their generation abilities with automatic scores and conduct human evaluations for qualitative results. We examine whether the model reflects adequate persona and knowledge with our proposed two sub-tasks, persona grounding (PG) and knowledge grounding (KG). Moreover, we show that the utterances of our data are constructed with the proper knowledge and persona through grounding quality assessment.

【42】 Frequency Spectrum Augmentation Consistency for Domain Adaptive Object Detection 标题:一种域自适应目标检测的频谱增强一致性算法 链接:https://arxiv.org/abs/2112.08605

作者:Rui Liu,Yahong Han,Yaowei Wang,Qi Tian 摘要:领域自适应目标检测(DAOD)旨在提高训练和测试数据来自不同领域时检测器的泛化能力。考虑到显著的域差距,一些典型的方法,例如基于CycleGAN的方法,采用中间域逐步桥接源域和目标域。然而,基于CycleGAN的中间域缺少pix或实例级别的对象检测监控,这导致语义差异。为了解决这个问题,在本文中,我们引入了一个频谱增强一致性(FSAC)框架,其中包含四种不同的低频滤波器操作。这样,我们就可以得到一系列的增广数据作为中间域。具体来说,我们提出了一个两阶段优化框架。在第一阶段中,我们利用所有原始和扩充的源数据来训练目标检测器。在第二阶段,采用带有伪标签的增广源数据和目标数据进行预测一致性的自训练。并利用均值教师优化的教师模型进一步修正伪标签。在实验中,我们分别对单目标DAOD和复合目标DAOD进行了评估,证明了我们的方法的有效性。 摘要:Domain adaptive object detection (DAOD) aims to improve the generalization ability of detectors when the training and test data are from different domains. Considering the significant domain gap, some typical methods, e.g., CycleGAN-based methods, adopt the intermediate domain to bridge the source and target domains progressively. However, the CycleGAN-based intermediate domain lacks the pix- or instance-level supervision for object detection, which leads to semantic differences. To address this problem, in this paper, we introduce a Frequency Spectrum Augmentation Consistency (FSAC) framework with four different low-frequency filter operations. In this way, we can obtain a series of augmented data as the intermediate domain. Concretely, we propose a two-stage optimization framework. In the first stage, we utilize all the original and augmented source data to train an object detector. In the second stage, augmented source and target data with pseudo labels are adopted to perform the self-training for prediction consistency. And a teacher model optimized using Mean Teacher is used to further revise the pseudo labels. In the experiment, we evaluate our method on the single- and compound- target DAOD separately, which demonstrate the effectiveness of our method.

【43】 Goal-Directed Story Generation: Augmenting Generative Language Models with Reinforcement Learning 标题:目标导向的故事生成:用强化学习增强生成语言模型 链接:https://arxiv.org/abs/2112.08593

作者:Amal Alabdulkarim,Winston Li,Lara J. Martin,Mark O. Riedl 备注:preprint 摘要:大型预先训练的生成性语言模型的出现为人工智能故事生成提供了一个通用框架,通过对模型进行采样来创建继续故事的序列。但是,仅采样不足以生成故事。特别是,很难指导语言模型创建故事以达到特定的目标事件。我们提出了两种基于深度强化学习和奖励塑造的自动化技术来控制计算机生成故事的情节。第一种方法利用近端策略优化对现有的基于转换器的语言模型进行微调,以生成文本连续性,但也可以进行目标搜索。第二种方法从展开的故事中提取一个知识图,该知识图被具有图注意的策略网络用于选择由语言模型生成的候选延续。我们报告了与故事实现给定目标事件的频率相关的自动化指标,以及与基线和破坏相比,人类参与者对连贯性和总体故事质量的排名。 摘要:The advent of large pre-trained generative language models has provided a common framework for AI story generation via sampling the model to create sequences that continue the story. However, sampling alone is insufficient for story generation. In particular, it is hard to direct a language model to create stories to reach a specific goal event. We present two automated techniques grounded in deep reinforcement learning and reward shaping to control the plot of computer-generated stories. The first utilizes proximal policy optimization to fine-tune an existing transformer-based language model to generate text continuations but also be goal-seeking. The second extracts a knowledge graph from the unfolding story, which is used by a policy network with graph attention to select a candidate continuation generated by a language model. We report on automated metrics pertaining to how often stories achieve a given goal event as well as human participant rankings of coherence and overall story quality compared to baselines and ablations.

【44】 Knowledge Graph Embedding in E-commerce Applications: Attentive Reasoning, Explanations, and Transferable Rules 标题:电子商务应用中的知识图嵌入:细心推理、解释和可转移规则 链接:https://arxiv.org/abs/2112.08589

作者:Wen Zhang,Shumin Deng,Mingyang Chen,Liang Wang,Qiang Chen,Feiyu Xiong,Xiangwen Liu,Huajun Chen 备注:Accepted at IJCKG2021 摘要:知识图(KG)将事实表示为三元组,在许多应用中得到了广泛的应用。推理任务(如链接预测和规则归纳)对于KGs的开发非常重要。知识图嵌入(KGE)将KG的实体和关系嵌入到连续向量空间中,已被提出用于这些推理任务,并被证明是有效和鲁棒的。但在实际工作应用中应用和部署KGE的合理性和可行性尚未得到很好的探索。在本文中,我们讨论并报告了在一个真实的领域应用程序(电子商务)中部署KGE的经验。我们首先确定了电子商务KG系统的三个重要要求:1)仔细的推理,对更多关注的几个目标关系而不是所有目标关系进行推理;2) 解释,为预测提供解释,以帮助用户和经营者了解预测的原因;3) 可转移的规则,生成可重用的规则,以加速KG在新系统中的部署。虽然不存在的KGE可以满足所有这些要求,但我们提出了一种新的KGE,即一种可解释的知识图注意网络,它通过建模三元组之间的关联来进行预测,而不是单纯依赖其头部实体、关系和尾部实体嵌入。它可以自动选择关注的三元组进行预测,同时记录它们的贡献,从中可以方便地提供解释,并有效地生成可转换的规则。我们的经验表明,我们的方法能够满足我们的电子商务应用程序中的所有三个需求,并且优于实际领域应用程序数据集上的典型基线。 摘要:Knowledge Graphs (KGs), representing facts as triples, have been widely adopted in many applications. Reasoning tasks such as link prediction and rule induction are important for the development of KGs. Knowledge Graph Embeddings (KGEs) embedding entities and relations of a KG into continuous vector spaces, have been proposed for these reasoning tasks and proven to be efficient and robust. But the plausibility and feasibility of applying and deploying KGEs in real-work applications has not been well-explored. In this paper, we discuss and report our experiences of deploying KGEs in a real domain application: e-commerce. We first identity three important desiderata for e-commerce KG systems: 1) attentive reasoning, reasoning over a few target relations of more concerns instead of all; 2) explanation, providing explanations for a prediction to help both users and business operators understand why the prediction is made; 3) transferable rules, generating reusable rules to accelerate the deployment of a KG to new systems. While non existing KGE could meet all these desiderata, we propose a novel one, an explainable knowledge graph attention network that make prediction through modeling correlations between triples rather than purely relying on its head entity, relation and tail entity embeddings. It could automatically selects attentive triples for prediction and records the contribution of them at the same time, from which explanations could be easily provided and transferable rules could be efficiently produced. We empirically show that our method is capable of meeting all three desiderata in our e-commerce application and outperform typical baselines on datasets from real domain applications.

【45】 Learning to acquire novel cognitive tasks with evolution, plasticity and meta-meta-learning 标题:学习获得具有进化性、可塑性和元学习的新认知任务 链接:https://arxiv.org/abs/2112.08588

作者:Thomas Miconi 摘要:在元学习中,使用外部算法对网络进行训练,以学习需要获取、存储和利用任务每个新实例的不可预测信息的任务。然而,由于进化的神经结构和突触可塑性机制,动物能够自动完成这些认知任务。在这里,我们基于神经科学建模框架,通过一系列简单的元学习任务,进化出具有可塑性连接的神经网络。由此产生的进化网络可以通过其进化的神经组织和可塑性结构的自发运作,自动获得训练过程中从未见过的新的简单认知任务。我们认为,关注自然学习中涉及的循环的多样性可能会为智能行为的出现提供有用的见解。 摘要:In meta-learning, networks are trained with external algorithms to learn tasks that require acquiring, storing and exploiting unpredictable information for each new instance of the task. However, animals are able to pick up such cognitive tasks automatically, as a result of their evolved neural architecture and synaptic plasticity mechanisms. Here we evolve neural networks, endowed with plastic connections, over a sizable set of simple meta-learning tasks based on a neuroscience modelling framework. The resulting evolved network can automatically acquire a novel simple cognitive task, never seen during training, through the spontaneous operation of its evolved neural organization and plasticity structure. We suggest that attending to the multiplicity of loops involved in natural learning may provide useful insight into the emergence of intelligent behavior.

【46】 SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning 标题:SGEITL:用于视觉常识推理的场景图增强图文学习 链接:https://arxiv.org/abs/2112.08587

作者:Zhecan Wang,Haoxuan You,Liunian Harold Li,Alireza Zareian,Suji Park,Yiqing Liang,Kai-Wei Chang,Shih-Fu Chang 备注:None 摘要:回答关于图像的复杂问题是机器智能的一个雄心勃勃的目标,它需要对图像、文本和常识的共同理解,以及强大的推理能力。近年来,多模态变换器在视觉常识推理(VCR)方面取得了巨大进展,它通过跨模态注意层共同理解视觉对象和文本标记。然而,这些方法并没有利用场景的丰富结构和对象之间的交互作用,这对于回答复杂的常识性问题至关重要。我们提出了一个场景图增强图像文本学习(SGEITL)框架,将视觉场景图融入常识推理。为了利用场景图结构,在模型结构层次上,我们提出了一种多跳图变换器,用于正则化跳之间的注意交互。在预训练方面,提出了一种场景图感知的预训练方法,以利用从视觉场景图中提取的结构知识。此外,我们还介绍了一种在弱监督的情况下使用文本注释来训练和生成与领域相关的视觉场景图的方法。在VCR和其他任务上进行的大量实验表明,与最先进的方法相比,性能显著提高,并证明了每个拟议组件的有效性。 摘要:Answering complex questions about images is an ambitious goal for machine intelligence, which requires a joint understanding of images, text, and commonsense knowledge, as well as a strong reasoning ability. Recently, multimodal Transformers have made great progress in the task of Visual Commonsense Reasoning (VCR), by jointly understanding visual objects and text tokens through layers of cross-modality attention. However, these approaches do not utilize the rich structure of the scene and the interactions between objects which are essential in answering complex commonsense questions. We propose a Scene Graph Enhanced Image-Text Learning (SGEITL) framework to incorporate visual scene graphs in commonsense reasoning. To exploit the scene graph structure, at the model structure level, we propose a multihop graph transformer for regularizing attention interaction among hops. As for pre-training, a scene-graph-aware pre-training method is proposed to leverage structure knowledge extracted in the visual scene graph. Moreover, we introduce a method to train and generate domain-relevant visual scene graphs using textual annotations in a weakly-supervised manner. Extensive experiments on VCR and other tasks show a significant performance boost compared with the state-of-the-art methods and prove the efficacy of each proposed component.

【47】 A First Mathematical Runtime Analysis of the Non-Dominated Sorting Genetic Algorithm II (NSGA-II) 标题:非支配排序遗传算法II(NSGA-II)的首次数学运行分析 链接:https://arxiv.org/abs/2112.08581

作者:Weijie Zheng,Yufei Liu,Benjamin Doerr 备注:Preprint version (not the camera-ready version) of one paper accepted in AAAI 2022 摘要:非支配排序遗传算法II(NSGA-II)是现实世界中应用最广泛的多目标进化算法(MOEA)。然而,与也通过数学方法分析的几个简单MOEA相比,迄今为止,还没有针对NSGA-II的此类研究。在这项工作中,我们证明了数学运行时分析对于NSGA-II也是可行的。作为特别的结果,我们证明了当种群规模比帕累托前沿规模大一个常数因子时,具有两个经典变异算子和三种不同的双亲选择方法的NSGA-II在基本OneMinMax和LOTZ基准函数上满足与SEMO和GSEMO算法相同的渐近运行时保证。然而,如果总体规模仅等于帕累托前沿的规模,则NSGA-II无法有效计算完整的帕累托前沿(对于指数级迭代次数,总体将始终错过帕累托前沿的恒定部分)。我们的实验证实了上述发现。 摘要:The non-dominated sorting genetic algorithm II (NSGA-II) is the most intensively used multi-objective evolutionary algorithm (MOEA) in real-world applications. However, in contrast to several simple MOEAs analyzed also via mathematical means, no such study exists for the NSGA-II so far. In this work, we show that mathematical runtime analyses are feasible also for the NSGA-II. As particular results, we prove that with a population size larger than the Pareto front size by a constant factor, the NSGA-II with two classic mutation operators and three different ways to select the parents satisfies the same asymptotic runtime guarantees as the SEMO and GSEMO algorithms on the basic OneMinMax and LOTZ benchmark functions. However, if the population size is only equal to the size of the Pareto front, then the NSGA-II cannot efficiently compute the full Pareto front (for an exponential number of iterations, the population will always miss a constant fraction of the Pareto front). Our experiments confirm the above findings.

【48】 HampDTI: a heterogeneous graph automatic meta-path learning method for drug-target interaction prediction 标题:HampDTI:一种用于药物-靶相互作用预测的异构图自动元路径学习方法 链接:https://arxiv.org/abs/2112.08567

作者:Hongzhun Wang,Feng Huang,Wen Zhang 备注:9 pages, 4 figures 摘要:动机:确定药物-靶点相互作用(DTI)是药物重新定位的关键步骤。近年来,大量基因组学和药理学数据的积累形成了大量药物和靶点相关异质网络(HNs),这为开发基于HN的计算模型以准确预测DTI提供了新的机会。HN包含大量关于DTI的有用信息,但也包含不相关的数据,如何充分利用异构网络仍然是一个挑战。结果:本文提出了一种基于异构图元路径学习的DTI预测方法(HampDTI)。HampDTI自动从HN中学习药物和靶点之间的重要元路径,并生成元路径图。对于每个元路径图,从药物分子图和靶蛋白序列中学习到的特征作为节点属性,然后设计一个节点类型特定图卷积网络(NSGCN),有效地考虑节点类型信息(药物或靶点),以学习药物和靶点的嵌入。最后,结合来自多个元路径图的嵌入来预测新的DTI。在基准数据集上的实验表明,与最先进的DTI预测方法相比,我们提出的HampDTI具有更高的性能。更重要的是,HampDTI确定了DTI预测的重要元途径,这可以解释药物如何与HNs中的靶点连接。 摘要:Motivation: Identifying drug-target interactions (DTIs) is a key step in drug repositioning. In recent years, the accumulation of a large number of genomics and pharmacology data has formed mass drug and target related heterogeneous networks (HNs), which provides new opportunities of developing HN-based computational models to accurately predict DTIs. The HN implies lots of useful information about DTIs but also contains irrelevant data, and how to make the best of heterogeneous networks remains a challenge. Results: In this paper, we propose a heterogeneous graph automatic meta-path learning based DTI prediction method (HampDTI). HampDTI automatically learns the important meta-paths between drugs and targets from the HN, and generates meta-path graphs. For each meta-path graph, the features learned from drug molecule graphs and target protein sequences serve as the node attributes, and then a node-type specific graph convolutional network (NSGCN) which efficiently considers node type information (drugs or targets) is designed to learn embeddings of drugs and targets. Finally, the embeddings from multiple meta-path graphs are combined to predict novel DTIs. The experiments on benchmark datasets show that our proposed HampDTI achieves superior performance compared with state-of-the-art DTI prediction methods. More importantly, HampDTI identifies the important meta-paths for DTI prediction, which could explain how drugs connect with targets in HNs.

【49】 A prediction-based approach for online dynamic radiotherapy scheduling 标题:一种基于预测的在线动态放射治疗调度方法 链接:https://arxiv.org/abs/2112.08549

作者:Tu-San Pham,Antoine Legrain,Patrick De Causmaecker,Louis-Martin Rousseau 摘要:患者调度是一项困难的任务,因为它涉及到处理随机因素,例如未知的患者到达流量。为癌症患者安排放射治疗也面临类似的问题。治愈患者需要在建议的最后期限内开始治疗,即入院后14或28天,同时为需要在入院后1至3天内进行紧急治疗的姑息性患者保留治疗能力。大多数癌症中心通过为急诊病人保留固定数量的治疗时段来解决这个问题。然而,这种单一的预约方式并不理想,可能会导致急诊患者在某些天的治疗过期,而在其他一些天没有充分利用治疗能力,这也会导致治愈患者的治疗延迟。这一问题在拥挤的大型医院尤为严重。在本文中,我们提出了一种基于预测的在线动态放射治疗计划方法。一个离线问题,其中所有未来的病人到达都是已知的提前解决了最优使用整数规划。然后训练回归模型以识别患者到达模式与其理想等待时间之间的联系。然后将经过训练的回归模型嵌入基于预测的方法中,该方法根据患者的特征和日历的当前状态安排患者。数值结果表明,与其他基于扁平预约策略的调度方法相比,我们基于预测的方法有效地防止了急诊患者的逾期治疗,同时保持了良好的等待时间。 摘要:Patient scheduling is a difficult task as it involves dealing with stochastic factors such as an unknown arrival flow of patients. Scheduling radiotherapy treatments for cancer patients faces a similar problem. Curative patients need to start their treatment within the recommended deadlines, i.e., 14 or 28 days after their admission while reserving treatment capacity for palliative patients who require urgent treatments within 1 to 3 days after their admission. Most cancer centers solve the problem by reserving a fixed number of treatment slots for emergency patients. However, this flat-reservation approach is not ideal and can cause overdue treatments for emergency patients on some days while not fully exploiting treatment capacity on some other days, which also leads to delaying treatment for curative patients. This problem is especially severe in large and crowded hospitals. In this paper, we propose a prediction-based approach for online dynamic radiotherapy scheduling. An offline problem where all future patient arrivals are known in advance is solved to optimality using Integer Programming. A regression model is then trained to recognize the links between patients' arrival patterns and their ideal waiting time. The trained regression model is then embedded in a prediction-based approach that schedules a patient based on their characteristics and the present state of the calendar. The numerical results show that our prediction-based approach efficiently prevents overdue treatments for emergency patients while maintaining a good waiting time compared to other scheduling approaches based on a flat-reservation policy.

【50】 NewsClaims: A New Benchmark for Claim Detection from News with Background Knowledge 标题:NewsClaims:一种基于背景知识的新闻索赔检测新基准 链接:https://arxiv.org/abs/2112.08544

作者:Revanth Gangi Reddy,Sai Chinthakindi,Zhenhailong Wang,Yi R. Fung,Kathryn S. Conger,Ahmed S. Elsayed,Martha Palmer,Heng Ji 备注:Preprint 摘要:索赔检测和核实对于理解新闻至关重要,并已成为减少新闻中错误信息的有前途的技术。然而,大多数现有工作侧重于索赔语句的分析,而忽略了关键的背景属性,如索赔人、索赔对象和与索赔相关的其他知识。在这项工作中,我们提出了NewsClaims,这是新闻领域中知识感知索赔检测的一个新基准。我们重新定义了索赔检测问题,包括提取与索赔相关的其他背景属性,并发布了529份索赔,注释了103篇新闻文章。此外,NewsClaims的目标是在新出现的场景中对索赔检测系统进行基准测试,包括几乎没有或几乎没有训练数据的看不见的主题。最后,我们为这个新基准提供了各种基于零拍和提示的基线的综合评估。 摘要:Claim detection and verification are crucial for news understanding and have emerged as promising technologies for mitigating misinformation in news. However, most existing work focus on analysis of claim sentences while overlooking crucial background attributes, such as the claimer, claim objects, and other knowledge connected to the claim. In this work, we present NewsClaims , a new benchmark for knowledge-aware claim detection in the news domain. We re-define the claim detection problem to include extraction of additional background attributes related to the claim and release 529 claims annotated over 103 news articles. In addition, NewsClaims aims to benchmark claim detection systems in emerging scenarios, comprising unseen topics with little or no training data. Finally, we provide a comprehensive evaluation of various zero-shot and prompt-based baselines for this new benchmark.

【51】 Integrated Guidance and Control for Lunar Landing using a Stabilized Seeker 标题:稳定导引头用于月球着陆的综合制导与控制 链接:https://arxiv.org/abs/2112.08540

作者:Brian Gaudet,Roberto Furfaro 备注:Accepted for 2022 AIAA Scitech GN&C. arXiv admin note: text overlap with arXiv:2107.14764, arXiv:2004.09978, arXiv:2110.00634, arXiv:2109.03880 摘要:我们开发了一个综合制导和控制系统,该系统与稳定导引头和着陆点探测软件相结合,可以实现精确和安全的行星着陆。导引头通过调整导引头仰角和方位角来跟踪指定的着陆点,使指定的着陆点位于传感器视野的中心。导引头角度、接近速度和到指定着陆点的距离用于形成速度场,制导和控制系统使用该速度场在指定着陆点实现安全着陆。导航和控制系统将速度场、姿态和旋转速度直接映射到着陆器四个发动机的指令推力矢量。制导和控制系统作为一种策略,使用强化元学习进行优化。我们证明了制导和控制系统在动力下降阶段与多个转向兼容,并且对导引头滞后、执行器滞后和退化以及由燃油消耗引起的质心变化具有鲁棒性。我们概述了几种作战概念,包括使用预先放置的着陆信标的方法。 摘要:We develop an integrated guidance and control system that in conjunction with a stabilized seeker and landing site detection software can achieve precise and safe planetary landing. The seeker tracks the designated landing site by adjusting seeker elevation and azimuth angles to center the designated landing site in the sensor field of view. The seeker angles, closing speed, and range to the designated landing site are used to formulate a velocity field that is used by the guidance and control system to achieve a safe landing at the designated landing site. The guidance and control system maps this velocity field, attitude, and rotational velocity directly to a commanded thrust vector for the lander's four engines. The guidance and control system is implemented as a policy optimized using reinforcement meta learning. We demonstrate that the guidance and control system is compatible with multiple diverts during the powered descent phase, and is robust to seeker lag, actuator lag and degradation, and center of mass variation induced by fuel consumption. We outline several concepts of operations, including an approach using a preplaced landing beacon.

【52】 Text Mining Through Label Induction Grouping Algorithm Based Method 标题:基于标签归纳分组算法的文本挖掘方法 链接:https://arxiv.org/abs/2112.08486

作者:Gulshan Saleem,Nisar Ahmed,Usman Qamar 备注:None 摘要:信息检索方法的主要重点是提供准确和高效的结果,而且这些结果也具有成本效益。LINGO(标签归纳分组算法)是一种聚类算法,旨在以质量聚类的形式提供搜索结果,但也有一些局限性。在本文中,我们的重点是实现更有意义的结果,并提高算法的整体性能。行话有两个主要步骤;使用潜在语义索引技术(LSI)进行聚类标签归纳,使用向量空间模型(VSM)进行聚类内容发现。由于LINGO在集群内容发现中使用VSM,我们的任务是用LSI代替VSM进行集群内容发现,并分析使用LSI和Okapi BM25的可行性。下一个任务是比较修改后的方法与行话原始方法的结果。该研究应用于五种不同的基于文本的数据集,以获得每种方法更可靠的结果。研究结果表明,当使用LSI进行内容发现时,LINGO可以产生40-50%的更好的结果。从理论证据来看,在LSI(LSI+Okapi BM25)中使用Okapi BM25进行集群内容发现(而不是VSM)的计分方法,与VSM和LSI的结果相比,在可伸缩性和性能方面,也会产生更好的集群生成。 摘要:The main focus of information retrieval methods is to provide accurate and efficient results which are cost-effective too. LINGO (Label Induction Grouping Algorithm) is a clustering algorithm that aims to provide search results in form of quality clusters but also has a few limitations. In this paper, our focus is based on achieving results that are more meaningful and improving the overall performance of the algorithm. LINGO works on two main steps; Cluster Label Induction by using Latent Semantic Indexing technique (LSI) and Cluster content discovery by using the Vector Space Model (VSM). As LINGO uses VSM in cluster content discovery, our task is to replace VSM with LSI for cluster content discovery and to analyze the feasibility of using LSI with Okapi BM25. The next task is to compare the results of a modified method with the LINGO original method. The research is applied to five different text-based data sets to get more reliable results for every method. Research results show that LINGO produces 40-50% better results when using LSI for content Discovery. From theoretical evidence using Okapi BM25 for scoring method in LSI (LSI+Okapi BM25) for cluster content discovery instead of VSM, also results in better clusters generation in terms of scalability and performance when compares to both VSM and LSI's Results.

【53】 The Need for Ethical, Responsible, and Trustworthy Artificial Intelligence for Environmental Sciences 标题:环境科学对道德的、负责任的和值得信赖的人工智能的需求 链接:https://arxiv.org/abs/2112.08453

作者:Amy McGovern,Imme Ebert-Uphoff,David John Gagne II,Ann Bostrom 摘要:鉴于人工智能(AI)和机器学习(ML)方法在环境科学各个方面的应用日益广泛,我们必须开始讨论AI的道德和负责任的使用。事实上,从人工智能引入的其他领域可以学到很多东西,这些领域往往是出于好意,但往往会导致意外的社会后果,如刑事司法系统中的硬编码种族偏见或通过金融系统加剧经济不平等。一个常见的误解是,在使用人工智能时,环境科学不会受到这些意外后果的影响,因为大多数数据来自观测,人工智能算法基于数学公式,而数学公式通常被视为客观的。在本文中,我们认为情况正好相反。通过具体的例子,我们展示了人工智能在环境科学中引入类似结果的许多方法。本文将促进这方面的讨论和研究工作。作为一个社区,我们应该避免通过引入人工智能在其他领域重复任何可预见的错误。事实上,如果采取适当的预防措施,人工智能可以成为帮助{减少}气候和环境不公的一个伟大工具。我们主要关注天气和气候的例子,但结论广泛应用于环境科学。 摘要:Given the growing use of Artificial Intelligence (AI) and machine learning (ML) methods across all aspects of environmental sciences, it is imperative that we initiate a discussion about the ethical and responsible use of AI. In fact, much can be learned from other domains where AI was introduced, often with the best of intentions, yet often led to unintended societal consequences, such as hard coding racial bias in the criminal justice system or increasing economic inequality through the financial system. A common misconception is that the environmental sciences are immune to such unintended consequences when AI is being used, as most data come from observations, and AI algorithms are based on mathematical formulas, which are often seen as objective. In this article, we argue the opposite can be the case. Using specific examples, we demonstrate many ways in which the use of AI can introduce similar consequences in the environmental sciences. This article will stimulate discussion and research efforts in this direction. As a community, we should avoid repeating any foreseeable mistakes made in other domains through the introduction of AI. In fact, with proper precautions, AI can be a great tool to help {it reduce} climate and environmental injustice. We primarily focus on weather and climate examples but the conclusions apply broadly across the environmental sciences.

【54】 Positional Encoding Augmented GAN for the Assessment of Wind Flow for Pedestrian Comfort in Urban Areas 标题:位置编码增强型GaN用于城市地区行人舒适性的风流评价 链接:https://arxiv.org/abs/2112.08447

作者:Henrik Høiness,Kristoffer Gjerde,Luca Oggiano,Knut Erik Teigen Giljarhus,Massimiliano Ruocco 摘要:使用计算流体动力学(CFD)方法近似风场可能会非常耗时。创建用于交互式设计原型的工具,同时观察风流变化,需要更简单的模型来更快地模拟。深度学习中的数据驱动方法可能能够在很短的时间内给出类似的结果,而不是运行导致详细计算的数值近似。这项工作将使用CFD计算三维流场的问题重新表述为基于建筑物足迹的二维图像到图像转换问题,以预测行人高度水平的流场。我们研究了生成性对抗网络(GAN)的使用,如Pix2Pix[1]和CycleGAN[2],它们代表了各个领域中图像到图像转换任务的最新技术,以及U-Net自动编码器[3]。模型可以以数据驱动的方式了解数据集的基本分布,我们认为这有助于模型从CFD中了解基本的雷诺平均Navier-Stokes(RANS)方程。我们在不同的有高度信息和没有高度信息的三维断崖形建筑物上进行了新的模拟数据集实验。此外,我们对一系列模型的生成图像进行了广泛的定性和定量评估,并将其性能与CFD提供的模拟结果进行了比较。然后,我们展示了向输入中添加位置数据可以通过在不同的体系结构上注入此类信息来产生更准确的结果。此外,我们还表明,通过应用注意机制和频谱归一化来促进稳定的训练,模型的性能得到了提高。 摘要:Approximating wind flows using computational fluid dynamics (CFD) methods can be time-consuming. Creating a tool for interactively designing prototypes while observing the wind flow change requires simpler models to simulate faster. Instead of running numerical approximations resulting in detailed calculations, data-driven methods in deep learning might be able to give similar results in a fraction of the time. This work rephrases the problem from computing 3D flow fields using CFD to a 2D image-to-image translation-based problem on the building footprints to predict the flow field at pedestrian height level. We investigate the use of generative adversarial networks (GAN), such as Pix2Pix [1] and CycleGAN [2] representing state-of-the-art for image-to-image translation task in various domains as well as U-Net autoencoder [3]. The models can learn the underlying distribution of a dataset in a data-driven manner, which we argue can help the model learn the underlying Reynolds-averaged Navier-Stokes (RANS) equations from CFD. We experiment on novel simulated datasets on various three-dimensional bluff-shaped buildings with and without height information. Moreover, we present an extensive qualitative and quantitative evaluation of the generated images for a selection of models and compare their performance with the simulations delivered by CFD. We then show that adding positional data to the input can produce more accurate results by proposing a general framework for injecting such information on the different architectures. Furthermore, we show that the models performances improve by applying attention mechanisms and spectral normalization to facilitate stable training.

【55】 Combating Collusion Rings is Hard but Possible 标题:打击串通团伙是困难的,但也是可能的 链接:https://arxiv.org/abs/2112.08444

作者:Niclas Boehmer,Robert Bredereck,André Nichterlein 备注:Accepted to AAAI'22 摘要:利特曼[common.ACM'21]最近的一份报告概述了学术同行评议中共谋环的存在及其致命影响。我们介绍并分析了无问题周期评审,其目的是在没有以下共谋环的情况下找到评审任务:一系列评审员,每个评审员评审序列中下一位评审员撰写的论文(最后一位评审员评审第一位评审员的论文),这样就形成了一个评审周期,每个评审员都会给出有利的评审。因此,该周期内的所有论文都有很高的被接受的机会,这与它们各自的科学价值无关。我们观察到,使用标准线性规划方法计算的评审任务通常允许许多短评审周期。在负面方面,我们表明,在各种限制性情况下(即,当每个作者都有资格审查所有论文,并且希望防止作者审查彼此或自己的论文时,或者当每个作者只有一篇论文,并且只有资格审查少数论文时),无周期审查是NP难的。在积极的一面,除其他外,我们表明,在一些现实环境中,没有任何小长度审查周期的作业总是存在的。这一结果也为计算(加权)无周期复习作业提供了一种有效的启发式方法,我们在实践中证明了该方法的优越性。 摘要:A recent report of Littmann [Commun. ACM '21] outlines the existence and the fatal impact of collusion rings in academic peer reviewing. We introduce and analyze the problem Cycle-Free Reviewing that aims at finding a review assignment without the following kind of collusion ring: A sequence of reviewers each reviewing a paper authored by the next reviewer in the sequence (with the last reviewer reviewing a paper of the first), thus creating a review cycle where each reviewer gives favorable reviews. As a result, all papers in that cycle have a high chance of acceptance independent of their respective scientific merit. We observe that review assignments computed using a standard Linear Programming approach typically admit many short review cycles. On the negative side, we show that Cycle-Free Reviewing is NP-hard in various restricted cases (i.e., when every author is qualified to review all papers and one wants to prevent that authors review each other's or their own papers or when every author has only one paper and is only qualified to review few papers). On the positive side, among others, we show that, in some realistic settings, an assignment without any review cycles of small length always exists. This result also gives rise to an efficient heuristic for computing (weighted) cycle-free review assignments, which we show to be of excellent quality in practice.

【56】 Event-Aware Multimodal Mobility Nowcasting 标题:事件感知多模式移动性现在广播 链接:https://arxiv.org/abs/2112.08443

作者:Zhaonan Wang,Renhe Jiang,Hao Xue,Flora D. Salim,Xuan Song,Ryosuke Shibasaki 备注:Accepted by AAAI 2022 摘要:作为移动即服务(MaaS)成功的决定性部分,人群移动的时空预测建模是一项具有挑战性的任务,特别是考虑到社会事件导致移动行为偏离常态的场景。虽然通过深入学习在高水平时空规律建模方面取得了巨大进展,但大多数(如果不是所有的话)现有方法既不了解多种运输模式之间的动态相互作用,也不适应潜在社会事件带来的前所未有的波动。在本文中,我们从两个角度对规范时空网络(ST-Net)进行了改进:(1)设计一个异构移动信息网络(HMIN),以明确表示多模移动中的多模性;(2) 提出了一种内存增强动态滤波器生成器(MDFG),用于在各种场景中动态生成特定于序列的参数。增强的事件感知时空网络,即EAST网络,在多个真实世界数据集上进行评估,这些数据集具有广泛的社会事件种类和覆盖范围。定量和定性实验结果都验证了我们的方法与最新基线相比的优越性。代码和数据发布在https://github.com/underdoc-wang/EAST-Net. 摘要:As a decisive part in the success of Mobility-as-a-Service (MaaS), spatio-temporal predictive modeling for crowd movements is a challenging task particularly considering scenarios where societal events drive mobility behavior deviated from the normality. While tremendous progress has been made to model high-level spatio-temporal regularities with deep learning, most, if not all of the existing methods are neither aware of the dynamic interactions among multiple transport modes nor adaptive to unprecedented volatility brought by potential societal events. In this paper, we are therefore motivated to improve the canonical spatio-temporal network (ST-Net) from two perspectives: (1) design a heterogeneous mobility information network (HMIN) to explicitly represent intermodality in multimodal mobility; (2) propose a memory-augmented dynamic filter generator (MDFG) to generate sequence-specific parameters in an on-the-fly fashion for various scenarios. The enhanced event-aware spatio-temporal network, namely EAST-Net, is evaluated on several real-world datasets with a wide variety and coverage of societal events. Both quantitative and qualitative experimental results verify the superiority of our approach compared with the state-of-the-art baselines. Code and data are published on https://github.com/underdoc-wang/EAST-Net.

【57】 Utilizing XAI technique to improve autoencoder based model for computer network anomaly detection with shapley additive explanation(SHAP) 标题:利用XAI技术改进基于自动编码器的Shapley加性解释(Shap)计算机网络异常检测模型 链接:https://arxiv.org/abs/2112.08442

作者:Khushnaseeb Roshan,Aasim Zafar 备注:None 摘要:机器学习(ML)和深度学习(DL)方法被迅速采用,尤其是在计算机网络安全领域,如欺诈检测、网络异常检测、入侵检测等。然而,基于ML和DL的模型缺乏透明度是它们实现的一个主要障碍,并且由于其黑盒性质而受到批评,即使有如此巨大的结果。可解释人工智能(XAI)是一个很有前途的领域,它可以通过解释和解释模型的输出来提高模型的可信度。如果基于ML和DL的模型的内部工作是可以理解的,那么它可以进一步帮助改进其性能。本文的目的是展示如何使用XAI来解释DL模型的结果,在本例中是自动编码器。并在此基础上,改进了其在计算机网络异常检测中的性能。基于shapley值的核SHAP方法是一种新的特征选择技术。此方法仅用于识别实际导致攻击/异常实例集异常行为的特征。之后,这些特征集用于训练和验证自动编码器,但仅用于良性数据。最后,构建的SHAP_模型优于基于特征选择方法提出的其他两个模型。整个实验是在最新的CICIDS2017网络数据集的子集上进行的。SHAP_模型的总体准确度和AUC分别为94%和0.969。 摘要:Machine learning (ML) and Deep Learning (DL) methods are being adopted rapidly, especially in computer network security, such as fraud detection, network anomaly detection, intrusion detection, and much more. However, the lack of transparency of ML and DL based models is a major obstacle to their implementation and criticized due to its black-box nature, even with such tremendous results. Explainable Artificial Intelligence (XAI) is a promising area that can improve the trustworthiness of these models by giving explanations and interpreting its output. If the internal working of the ML and DL based models is understandable, then it can further help to improve its performance. The objective of this paper is to show that how XAI can be used to interpret the results of the DL model, the autoencoder in this case. And, based on the interpretation, we improved its performance for computer network anomaly detection. The kernel SHAP method, which is based on the shapley values, is used as a novel feature selection technique. This method is used to identify only those features that are actually causing the anomalous behaviour of the set of attack/anomaly instances. Later, these feature sets are used to train and validate the autoencoder but on benign data only. Finally, the built SHAP_Model outperformed the other two models proposed based on the feature selection method. This whole experiment is conducted on the subset of the latest CICIDS2017 network dataset. The overall accuracy and AUC of SHAP_Model is 94% and 0.969, respectively.

【58】 Towards Explainable Artificial Intelligence in Banking and Financial Services 标题:走向银行和金融服务中的可解释人工智能 链接:https://arxiv.org/abs/2112.08441

作者:Ambreen Hanif 摘要:人工智能(AI)使机器能够从人类经验中学习,适应新的输入,并执行类似于人类的任务。人工智能发展迅速,正在改变企业运营方式,从流程自动化到任务认知增强和智能流程/数据分析。然而,人类用户面临的主要挑战是理解并适当信任人工智能算法和方法的结果。在本文中,为了应对这一挑战,我们研究和分析了最近在可解释人工智能(XAI)方法和工具方面所做的工作。我们介绍了一种新的XAI过程,它有助于生成可解释的模型,同时保持高水平的学习性能。我们提出了一种交互式的基于证据的方法来帮助人类用户理解和信任人工智能算法产生的结果和输出。我们采用银行领域的典型场景来分析客户交易。我们开发了一个数字仪表盘,以便于与算法结果进行交互,并讨论了拟议的XAI方法如何显著提高数据科学家理解AI算法结果的信心。 摘要:Artificial intelligence (AI) enables machines to learn from human experience, adjust to new inputs, and perform human-like tasks. AI is progressing rapidly and is transforming the way businesses operate, from process automation to cognitive augmentation of tasks and intelligent process/data analytics. However, the main challenge for human users would be to understand and appropriately trust the result of AI algorithms and methods. In this paper, to address this challenge, we study and analyze the recent work done in Explainable Artificial Intelligence (XAI) methods and tools. We introduce a novel XAI process, which facilitates producing explainable models while maintaining a high level of learning performance. We present an interactive evidence-based approach to assist human users in comprehending and trusting the results and output created by AI-enabled algorithms. We adopt a typical scenario in the Banking domain for analyzing customer transactions. We develop a digital dashboard to facilitate interacting with the algorithm results and discuss how the proposed XAI method can significantly improve the confidence of data scientists in understanding the result of AI-enabled algorithms.

【59】 Generalization Bounds for Stochastic Gradient Langevin Dynamics: A Unified View via Information Leakage Analysis 标题:随机梯度朗之万动力学的广义界:基于信息泄漏分析的统一观点 链接:https://arxiv.org/abs/2112.08439

作者:Bingzhe Wu,Zhicong Liang,Yatao Bian,ChaoChao Chen,Junzhou Huang,Yuan Yao 摘要:最近,利用随机梯度朗之万动力学(SGLD)对非凸经验风险最小化范式的推广界进行了广泛的研究。人们从不同的角度提出了一些理论框架来研究这个问题,如信息论和稳定性。在本文中,我们从隐私泄漏分析中提出了一个统一的观点来研究SGLD的泛化边界,并提供了一个理论框架,以简洁的方式重新推导先前的结果。除了理论发现之外,我们还进行了各种数值研究来实证评估SGLD的信息泄漏问题。此外,我们的理论和实证结果为先前研究SGLD成员隐私的工作提供了解释。 摘要:Recently, generalization bounds of the non-convex empirical risk minimization paradigm using Stochastic Gradient Langevin Dynamics (SGLD) have been extensively studied. Several theoretical frameworks have been presented to study this problem from different perspectives, such as information theory and stability. In this paper, we present a unified view from privacy leakage analysis to investigate the generalization bounds of SGLD, along with a theoretical framework for re-deriving previous results in a succinct manner. Aside from theoretical findings, we conduct various numerical studies to empirically assess the information leakage issue of SGLD. Additionally, our theoretical and empirical results provide explanations for prior works that study the membership privacy of SGLD.

【60】 Programmatic Reward Design by Example 标题:基于实例的程序性奖励设计 链接:https://arxiv.org/abs/2112.08438

作者:Weichao Zhou,Wenchao Li 摘要:奖励设计是强化学习中的一个基本问题。错误指定或设计不当的奖励可能会导致低样本效率和不良行为。在本文中,我们提出了 ext{程序奖励设计}的思想,即在RL环境中使用程序指定奖励功能。程序允许人类工程师以结构化和可解释的方式表达子目标和复杂任务场景。然而,程序性奖励设计的挑战在于,尽管人类可以提供高层次的结构,但正确设置低层次的细节,例如为特定子任务设置适当数量的奖励,仍然很困难。本文的主要贡献是一个概率框架,它可以从专家演示中推断出最佳候选程序奖励函数。受最近生成性对抗方法的启发,我们的框架{搜索最可能的程序性奖励函数,在该函数下,最佳生成的轨迹无法与演示的轨迹区分}。实验结果表明,使用该框架学习的程序性奖励函数可以显著优于使用现有奖励学习算法学习的程序性奖励函数,并使RL代理能够在高度复杂的任务上实现最先进的性能。 摘要:Reward design is a fundamental problem in reinforcement learning (RL). A misspecified or poorly designed reward can result in low sample efficiency and undesired behaviors. In this paper, we propose the idea of extit{programmatic reward design}, i.e. using programs to specify the reward functions in RL environments. Programs allow human engineers to express sub-goals and complex task scenarios in a structured and interpretable way. The challenge of programmatic reward design, however, is that while humans can provide the high-level structures, properly setting the low-level details, such as the right amount of reward for a specific sub-task, remains difficult. A major contribution of this paper is a probabilistic framework that can infer the best candidate programmatic reward function from expert demonstrations. Inspired by recent generative-adversarial approaches, our framework {searches for the most likely programmatic reward function under which the optimally generated trajectories cannot be differentiated from the demonstrated trajectories}. Experimental results show that programmatic reward functions learned using this framework can significantly outperform those learned using existing reward learning algorithms, and enable RL agents to achieve state-of-the-art performance on highly complex tasks.

【61】 DSGPT: Domain-Specific Generative Pre-Training of Transformers for Text Generation in E-commerce Title and Review Summarization 标题:DSGPT:电子商务标题和评论摘要中用于文本生成的转换器的特定领域生成性预训练(DSGPT:Domain-Specific Generative Pre-Training for Text Generate Text and Review Summary) 链接:https://arxiv.org/abs/2112.08414

作者:Xueying Zhang,Yunjiang Jiang,Yue Shang,Zhaomeng Cheng,Chi Zhang,Xiaochuan Fan,Yun Xiao,Bo Long 备注:None 摘要:我们提出了一种新的领域特定生成性预训练(DS-GPT)文本生成方法,并将其应用于电子商务移动显示屏上的产品标题和评论摘要问题。首先,我们采用了一种仅限解码器的转换器架构,它通过将输入和输出全部合并到一起来适合微调任务。其次,我们证明了在相关领域仅利用少量的预训练数据是有效的。从一般语料库(如Wikipedia或CommonCrawl)预训练语言模型需要大量的时间和资源投入,如果下游任务的种类有限,则可能会造成浪费。OurDSGPT是在有限的数据集——中文短文本摘要数据集(LCSTS)上预先训练的。第三,我们的模型不需要复制相关的人类标记数据。对于标题摘要任务,最新技术在训练和预测阶段明确使用了额外的背景知识。相比之下,我们的模型隐式地捕获了这些知识,并在公共淘宝上进行微调后,取得了比其他方法更大的改进。comdataset。对于复习总结任务,我们使用JD。com,并观察到与缺乏微调灵活性的标准机器翻译方法相比的类似改进。我们提出的工作可以简单地扩展到其他领域,以实现广泛的文本生成任务。 摘要:We propose a novel domain-specific generative pre-training (DS-GPT) method for text generation and apply it to the product titleand review summarization problems on E-commerce mobile display.First, we adopt a decoder-only transformer architecture, which fitswell for fine-tuning tasks by combining input and output all to-gether. Second, we demonstrate utilizing only small amount of pre-training data in related domains is powerful. Pre-training a languagemodel from a general corpus such as Wikipedia or the CommonCrawl requires tremendous time and resource commitment, andcan be wasteful if the downstream tasks are limited in variety. OurDSGPT is pre-trained on a limited dataset, the Chinese short textsummarization dataset (LCSTS). Third, our model does not requireproduct-related human-labeled data. For title summarization task,the state of art explicitly uses additional background knowledgein training and predicting stages. In contrast, our model implic-itly captures this knowledge and achieves significant improvementover other methods, after fine-tuning on the public Taobao.comdataset. For review summarization task, we utilize JD.com in-housedataset, and observe similar improvement over standard machinetranslation methods which lack the flexibility of fine-tuning. Ourproposed work can be simply extended to other domains for a widerange of text generation tasks.

【62】 Lifelong Generative Modelling Using Dynamic Expansion Graph Model 标题:基于动态扩展图模型的终身创成式建模 链接:https://arxiv.org/abs/2112.08370

作者:Fei Ye,Adrian G. Bors 备注:Accepted in Proceedings of the 36th AAAI Conference on Artificial Intelligence (AAAI 2022) 摘要:变分自动编码器(VAE)在学习多个连续任务时,性能会退化。这是由灾难性的遗忘造成的。为了解决知识丢失问题,虚拟企业正在使用生成重放(GR)机制或扩展网络体系结构(ENA)。在本文中,我们使用GR和ENA联合方法,通过推导负边际对数似然上界来研究VAEs的遗忘行为。这一理论分析为VAEs如何在终身学习中忘记先前学到的知识提供了新的见解。分析表明,在ENA框架下,当考虑模型混合物时,在没有组件数量限制的情况下,达到了最佳性能。然而,基于ENA的方法可能需要过多的参数。这促使我们提出了一种新的动态扩展图模型(DEGM)。DEGM根据与每个新数据库相关的新颖性,与网络从以前的任务中已经学习到的信息相比较,扩展了其体系结构。DEGM训练优化了知识结构,描述了与过去和最近学习的任务相对应的联合概率表示。我们证明了DEGM保证了每个任务的最佳性能,同时也最小化了所需的参数数量。补充资料(SM)和源代码可在https://github.com/dtuzi123/Expansion-Graph-Model. 摘要:Variational Autoencoders (VAEs) suffer from degenerated performance, when learning several successive tasks. This is caused by catastrophic forgetting. In order to address the knowledge loss, VAEs are using either Generative Replay (GR) mechanisms or Expanding Network Architectures (ENA). In this paper we study the forgetting behaviour of VAEs using a joint GR and ENA methodology, by deriving an upper bound on the negative marginal log-likelihood. This theoretical analysis provides new insights into how VAEs forget the previously learnt knowledge during lifelong learning. The analysis indicates the best performance achieved when considering model mixtures, under the ENA framework, where there are no restrictions on the number of components. However, an ENA-based approach may require an excessive number of parameters. This motivates us to propose a novel Dynamic Expansion Graph Model (DEGM). DEGM expands its architecture, according to the novelty associated with each new databases, when compared to the information already learnt by the network from previous tasks. DEGM training optimizes knowledge structuring, characterizing the joint probabilistic representations corresponding to the past and more recently learned tasks. We demonstrate that DEGM guarantees optimal performance for each task while also minimizing the required number of parameters. Supplementary materials (SM) and source code are available in https://github.com/dtuzi123/Expansion-Graph-Model.

【63】 Feature-Attending Recurrent Modules for Generalization in Reinforcement Learning 标题:强化学习中泛化的特征参与递归模块 链接:https://arxiv.org/abs/2112.08369

作者:Wilka Carvalho,Andrew Lampinen,Kyriacos Nikiforou,Felix Hill,Murray Shanahan 摘要:深度强化学习(Deep-RL)最近在开发泛化算法方面取得了重大进展。然而,大多数算法针对单一类型的泛化设置。在这项工作中,我们研究了三种不同任务结构的泛化:(a)由规则发生的物体运动的空间和时间组成的任务;(b) 由对规则出现的3D对象的主动感知和导航组成的任务;和(c)由记忆目标信息组成的任务,这些目标信息覆盖定期出现的对象配置序列。这些不同的任务结构都有一个组成性的基本概念:任务完成总是涉及到将任务导向的感知和行为的重复部分结合起来。我们假设,如果代理能够发现捕获这些重复任务段的表示,那么它可以在任务结构中进行泛化。对于我们的任务,这对应于用于识别单个对象运动、导航到三维对象以及在对象配置中导航的表示。受认知科学的启发,我们将代理人经验中反复出现的部分称为“知觉图式”。我们提出了特征参与递归模块(featureattenting recurtive Modules,FARM),它学习感知模式分布在多个相对较小的递归模块上的状态表示。我们将农场与利用空间注意力的重复架构进行比较,后者将观察特征减少到空间位置的加权平均值。我们的实验表明,我们的特征注意机制能够更好地使FARM在我们所研究的各种以对象为中心的领域中进行推广。 摘要:Deep reinforcement learning (Deep RL) has recently seen significant progress in developing algorithms for generalization. However, most algorithms target a single type of generalization setting. In this work, we study generalization across three disparate task structures: (a) tasks composed of spatial and temporal compositions of regularly occurring object motions; (b) tasks composed of active perception of and navigation towards regularly occurring 3D objects; and (c) tasks composed of remembering goal-information over sequences of regularly occurring object-configurations. These diverse task structures all share an underlying idea of compositionality: task completion always involves combining recurring segments of task-oriented perception and behavior. We hypothesize that an agent can generalize within a task structure if it can discover representations that capture these recurring task-segments. For our tasks, this corresponds to representations for recognizing individual object motions, for navigation towards 3D objects, and for navigating through object-configurations. Taking inspiration from cognitive science, we term representations for recurring segments of an agent's experience, "perceptual schemas". We propose Feature Attending Recurrent Modules (FARM), which learns a state representation where perceptual schemas are distributed across multiple, relatively small recurrent modules. We compare FARM to recurrent architectures that leverage spatial attention, which reduces observation features to a weighted average over spatial positions. Our experiments indicate that our feature-attention mechanism better enables FARM to generalize across the diverse object-centric domains we study.

【64】 Data Valuation for Vertical Federated Learning: An Information-Theoretic Approach 标题:垂直联合学习的数据评估:信息论方法 链接:https://arxiv.org/abs/2112.08364

作者:Xiao Han,Leye Wang,Junjie Wu 摘要:联邦学习(FL)是一种很有前途的机器学习范式,它以保护隐私和法律监管的方式为真实世界的AI应用程序实现跨党派数据协作。如何评估缔约方的数据是一个关键但具有挑战性的问题。在文献中,数据评估要么依赖于为给定任务运行特定模型,要么只是与任务无关;然而,在FL模型尚未确定的情况下,通常需要根据特定任务选择参与方。因此,这项工作填补了这一空白,并就我们所知,提出了第一种针对垂直FL任务的隐私保护、任务特定但无模型的数据评估方法。具体而言,FedValue采用了一种称为Shapley CMI的新的信息论指标,从博弈论的角度评估多方的数据值。此外,设计了一种新的服务器辅助联邦计算机制来计算Shapley CMI,同时保护各方免受数据泄漏。我们还提出了几种在实际中加速Shapley CMI计算的技术。在六个开放数据集上的大量实验验证了FedValue在垂直FL任务数据评估中的有效性和效率。特别是,Shapley CMI作为一种无模型度量,其性能与依赖于运行一组性能良好的模型的度量相当。 摘要:Federated learning (FL) is a promising machine learning paradigm that enables cross-party data collaboration for real-world AI applications in a privacy-preserving and law-regulated way. How to valuate parties' data is a critical but challenging FL issue. In the literature, data valuation either relies on running specific models for a given task or is just task irrelevant; however, it is often requisite for party selection given a specific task when FL models have not been determined yet. This work thus fills the gap and proposes emph{FedValue}, to our best knowledge, the first privacy-preserving, task-specific but model-free data valuation method for vertical FL tasks. Specifically, FedValue incorporates a novel information-theoretic metric termed Shapley-CMI to assess data values of multiple parties from a game-theoretic perspective. Moreover, a novel server-aided federated computation mechanism is designed to compute Shapley-CMI and meanwhile protects each party from data leakage. We also propose several techniques to accelerate Shapley-CMI computation in practice. Extensive experiments on six open datasets validate the effectiveness and efficiency of FedValue for data valuation of vertical FL tasks. In particular, Shapley-CMI as a model-free metric performs comparably with the measures that depend on running an ensemble of well-performing models.

【65】 How to Learn and Represent Abstractions: An Investigation using Symbolic Alchemy 标题:如何学习和表现抽象:运用符号炼金术的研究 链接:https://arxiv.org/abs/2112.08360

作者:Badr AlKhamissi,Akshay Srinivasan,Zeb-Kurth Nelson,Sam Ritter 备注:Preprint 摘要:Alchemy是一个新的元学习环境,它足够丰富,可以包含有趣的抽象,但也足够简单,可以使细粒度分析变得易于处理。此外,Alchemy提供了一个可选的符号接口,使meta RL研究无需大量计算预算。在这项工作中,我们采取了第一步,使用符号炼金术来确定设计选择,使深度RL代理能够学习各种类型的抽象。然后,通过各种行为和内省分析,我们调查了我们训练有素的代理如何使用和表示抽象任务变量,并发现了与抽象神经科学的有趣联系。最后,我们讨论了使用meta RL和炼金术更好地理解抽象变量在大脑中的表现的下一步。 摘要:Alchemy is a new meta-learning environment rich enough to contain interesting abstractions, yet simple enough to make fine-grained analysis tractable. Further, Alchemy provides an optional symbolic interface that enables meta-RL research without a large compute budget. In this work, we take the first steps toward using Symbolic Alchemy to identify design choices that enable deep-RL agents to learn various types of abstraction. Then, using a variety of behavioral and introspective analyses we investigate how our trained agents use and represent abstract task variables, and find intriguing connections to the neuroscience of abstraction. We conclude by discussing the next steps for using meta-RL and Alchemy to better understand the representation of abstract variables in the brain.

【66】 Bootstrap Equilibrium and Probabilistic Speaker Representation Learning for Self-supervised Speaker Verification 标题:用于自监督说话人确认的Bootstrap均衡和概率说话人表征学习 链接:https://arxiv.org/abs/2112.08929

作者:Sung Hwan Mun,Min Hyun Han,Dongjune Lee,Jihwan Kim,Nam Soo Kim 备注:Accepted by IEEE Access 摘要:在本文中,我们提出了自监督说话人表示学习策略,包括前端的bootstrap均衡说话人表示学习和后端的不确定性感知概率说话人嵌入训练。在前端阶段,我们通过带一致性正则化项的自举训练方案学习说话人表示。在后端阶段,通过最大化属于同一说话人的语音样本之间的相互似然得分来估计概率说话人嵌入,这不仅提供说话人表示,而且提供数据不确定性。实验结果表明,所提出的bootstrap均衡训练策略能够有效地帮助学习说话人表征,并优于传统的基于对比学习的方法。此外,我们还证明了集成的两阶段框架进一步提高了VoxCeleb1测试集在EER和MinDCF方面的说话人验证性能。 摘要:In this paper, we propose self-supervised speaker representation learning strategies, which comprise of a bootstrap equilibrium speaker representation learning in the front-end and an uncertainty-aware probabilistic speaker embedding training in the back-end. In the front-end stage, we learn the speaker representations via the bootstrap training scheme with the uniformity regularization term. In the back-end stage, the probabilistic speaker embeddings are estimated by maximizing the mutual likelihood score between the speech samples belonging to the same speaker, which provide not only speaker representations but also data uncertainty. Experimental results show that the proposed bootstrap equilibrium training strategy can effectively help learn the speaker representations and outperforms the conventional methods based on contrastive learning. Also, we demonstrate that the integrated two-stage framework further improves the speaker verification performance on the VoxCeleb1 test set in terms of EER and MinDCF.

【67】 Characterization of causal ancestral graphs for time series with latent confounders 标题:含潜在混杂因素的时间序列因果祖图的刻画 链接:https://arxiv.org/abs/2112.08417

作者:Andreas Gerhardus 备注:55 pages (including appendix), 16 figures 摘要:推广有向最大祖先图,我们引入了一类图形模型,用于表示具有未观测变量的多元时间序列的有限多个定期采样和定期次采样时间步之间的时滞特定因果关系和独立性。我们完全描述了这些图,并表明它们包含的约束超出了先前文献中考虑的约束。这允许在没有附加假设的情况下进行更强的因果推断。在有向部分祖先图的推广中,我们进一步介绍了新类型图的马尔可夫等价类的图形表示,并表明它们比当前最先进的因果发现算法所学的知识更丰富。我们还分析了通过增加观察到的时间步数获得的附加信息。 摘要:Generalizing directed maximal ancestral graphs, we introduce a class of graphical models for representing time lag specific causal relationships and independencies among finitely many regularly sampled and regularly subsampled time steps of multivariate time series with unobserved variables. We completely characterize these graphs and show that they entail constraints beyond those that have previously been considered in the literature. This allows for stronger causal inferences without having imposed additional assumptions. In generalization of directed partial ancestral graphs we further introduce a graphical representation of Markov equivalence classes of the novel type of graphs and show that these are more informative than what current state-of-the-art causal discovery algorithms learn. We also analyze the additional information gained by increasing the number of observed time steps.

【68】 AGMI: Attention-Guided Multi-omics Integration for Drug Response Prediction with Graph Neural Networks 标题:AGMI:注意力引导的多组学集成图神经网络药物反应预测 链接:https://arxiv.org/abs/2112.08366

作者:Feng Ruiwei,Xie Yufeng,Lai Minshan,Chen Danny,Cao Ji,Wu Jian 摘要:准确的药物反应预测(DRP)是精确医学中一项关键而富有挑战性的任务。本文提出了一种新的用于DRP的注意引导多组学整合(AGMI)方法,该方法首先为每个细胞系构建一个多边缘图(MeG),然后使用一种称为图形边缘感知网络(GeNet)的新结构聚集多组学特征以预测药物反应。我们的AGMI方法首次探索了基于基因约束的多组学整合,利用GNNs将DRP与整个基因组进行整合。在CCLE和GDSC数据集上的实证实验表明,我们的AGMI在四个指标上大大优于最先进的DRP方法8.3%-34.2%。我们的数据和代码可在https://github.com/yivan-WYYGDSG/AGMI. 摘要:Accurate drug response prediction (DRP) is a crucial yet challenging task in precision medicine. This paper presents a novel Attention-Guided Multi-omics Integration (AGMI) approach for DRP, which first constructs a Multi-edge Graph (MeG) for each cell line, and then aggregates multi-omics features to predict drug response using a novel structure, called Graph edge-aware Network (GeNet). For the first time, our AGMI approach explores gene constraint based multi-omics integration for DRP with the whole-genome using GNNs. Empirical experiments on the CCLE and GDSC datasets show that our AGMI largely outperforms state-of-the-art DRP methods by 8.3%--34.2% on four metrics. Our data and code are available at https://github.com/yivan-WYYGDSG/AGMI.