zl程序教程

您现在的位置是:首页 >  数据库

当前栏目

统计学学术速递[11.8]

2023-03-14 22:51:36 时间

Update!H5支持摘要折叠,体验更佳!点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!

stat统计学,共计46篇

【1】 Improved inference for doubly robust estimators of heterogeneous treatment effects 标题:异质治疗效果双稳健估计的改进推断 链接:https://arxiv.org/abs/2111.03594

作者:Heejun Shin,Joseph Antonelli 机构: Department of Statistics, University of Florida (Email 摘要:我们在观察性研究中提出了一种双重稳健的方法来描述治疗效果的异质性。我们利用倾向评分和结果回归模型的后验分布,即使使用高维或非参数模型,也能对条件平均治疗效果进行有效推断。我们证明了我们的方法在有限样本或模型错误指定情况下导致保守推断,并且在两个模型正确指定时提供了一致的方差估计。在模拟中,我们说明了这些结果在困难环境中的效用,例如高维协变量空间或倾向评分和结果回归的高度灵活的模型。最后,我们分析了NHANES的环境暴露数据,以确定这些暴露的影响如何随受试者水平特征而变化。 摘要:We propose a doubly robust approach to characterizing treatment effect heterogeneity in observational studies. We utilize posterior distributions for both the propensity score and outcome regression models to provide valid inference on the conditional average treatment effect even when high-dimensional or nonparametric models are used. We show that our approach leads to conservative inference in finite samples or under model misspecification, and provides a consistent variance estimator when both models are correctly specified. In simulations, we illustrate the utility of these results in difficult settings such as high-dimensional covariate spaces or highly flexible models for the propensity score and outcome regression. Lastly, we analyze environmental exposure data from NHANES to identify how the effects of these exposures vary by subject-level characteristics.

【2】 Predicting Antimicrobial Resistance in the Intensive Care Unit 标题:重症监护病房抗菌药物耐药性预测 链接:https://arxiv.org/abs/2111.03575

作者:Taiyao Wang,Kyle R. Hansen,Joshua Loving,Ioannis Ch. Paschalidis,Helen van Aggelen,Eran Simhon 摘要:抗生素耐药性(AMR)是患者的风险,也是医疗系统的负担。然而,AMR分析通常需要几天时间。本研究基于易于获得的临床和微生物预测因子,开发AMR预测模型,包括患者人口统计、住院数据、诊断、临床特征、,和微生物/抗菌特性,并将这些模型与仅使用微生物/抗菌特性的基于原始抗菌谱的模型进行比较。在培养前准确预测耐药性的能力可以为临床决策提供信息,并缩短采取行动的时间。与使用飞利浦eICU研究所(eRI)数据库的6种生物体和10种抗生素的朴素模型(接收器工作特性曲线下的面积0.86)相比,此处使用的机器学习算法显示了更好的分类性能(接收器工作特性曲线下的面积0.88-0.89)。该方法有助于指导抗生素治疗,目的是改善患者预后,减少不必要或无效抗生素的使用。 摘要:Antimicrobial resistance (AMR) is a risk for patients and a burden for the healthcare system. However, AMR assays typically take several days. This study develops predictive models for AMR based on easily available clinical and microbiological predictors, including patient demographics, hospital stay data, diagnoses, clinical features, and microbiological/antimicrobial characteristics and compares those models to a naive antibiogram based model using only microbiological/antimicrobial characteristics. The ability to predict the resistance accurately prior to culturing could inform clinical decision-making and shorten time to action. The machine learning algorithms employed here show improved classification performance (area under the receiver operating characteristic curve 0.88-0.89) versus the naive model (area under the receiver operating characteristic curve 0.86) for 6 organisms and 10 antibiotics using the Philips eICU Research Institute (eRI) database. This method can help guide antimicrobial treatment, with the objective of improving patient outcomes and reducing the usage of unnecessary or ineffective antibiotics.

【3】 Why the 1-Wasserstein distance is the area between the two marginal CDFs 标题:为什么1-Wasserstein距离是两个边缘CDF之间的面积 链接:https://arxiv.org/abs/2111.03570

作者:Marco De Angelis,Ander Gray 机构:Institute for Risk and Uncertainty, University of Liverpool 备注:6 pages, 1 figure, a pedagogical note 摘要:我们解释了为什么1-Wasserstein距离$W_1$与两个边际累积分布函数(CDF)之间的面积一致。我们首先用copula来描述Wasserstein距离,然后用$M$copula证明了欧几里德距离的$W_1$。由$M$copula给出依赖关系的两个随机变量表现出完全(正)依赖关系。如果我们用CDF来表示随机变量,那么可以直观地看到两个随机变量之间的距离与两个CDF之间的面积一致。 摘要:We elucidate why the 1-Wasserstein distance $W_1$ coincides with the area between the two marginal cumulative distribution functions (CDFs). We first describe the Wasserstein distance in terms of copulas, and then show that $W_1$ with the Euclidean distance is attained with the $M$ copula. Two random variables whose dependence is given by the $M$ copula manifest perfect (positive) dependence. If we express the random variables in terms of their CDFs, it is intuitive to see that the distance between two such random variables coincides with the area between the two CDFs.

【4】 On the effective dimension and multilevel Monte Carlo 标题:关于有效维数和多水平蒙特卡罗 链接:https://arxiv.org/abs/2111.03561

作者:Nabil Kahalé 机构:ESCP Business School 摘要:我考虑在$D $维单位立方体上集成函数$F$的问题。我描述了一种多级蒙特卡罗方法,该方法估计在$O(d+ln(d)d{t}epsilon^{-2})$time中,对于$epsilon>0$的方差为$epsilon^{2}$的积分,其中$d{t}$是$f$的截断维数。相比之下,标准蒙特卡罗方法通常在$O(depsilon^{-2})$时间内实现这种差异。描述了一类多级蒙特卡罗方法的$d+d{t}epsilon^{-2}$阶下界。 摘要:I consider the problem of integrating a function $f$ over the $d$-dimensional unit cube. I describe a multilevel Monte Carlo method that estimates the integral with variance at most $epsilon^{2}$ in $O(d+ln(d)d_{t}epsilon^{-2})$ time, for $epsilon>0$, where $d_{t}$ is the truncation dimension of $f$. In contrast, the standard Monte Carlo method typically achieves such variance in $O(depsilon^{-2})$ time. A lower bound of order $d+d_{t}epsilon^{-2}$ is described for a class of multilevel Monte Carlo methods.

【5】 Contextual Bayesian optimization with binary outputs 标题:具有二进制输出的上下文贝叶斯优化 链接:https://arxiv.org/abs/2111.03447

作者:Tristan Fauvel,Matthew Chalk 机构:Sorbonne Universit´e, INSERM, CNRS, Institut de la Vision, F-, Paris, France 摘要:贝叶斯优化(BO)是一种优化昂贵的黑盒函数的有效方法。它已被推广到目标函数评估返回随机二元反馈的场景,如给定测试中的成功/失败,或不同参数设置之间的偏好。在许多实际情况下,可以在直接影响观测的受控“环境”或“环境”中评估目标函数。例如,可以直接改变用于评估系统性能的测试的“难度”。通过二元反馈,上下文确定从每次观察中获得的信息。例如,如果测试太容易/太难,系统总是会成功/失败,产生非信息性的二进制输出。在这里,我们结合贝叶斯主动学习和优化的思想,在每次迭代中有效地选择最佳上下文和优化参数。我们演示了我们算法的性能,并说明了如何将其用于解决视觉心理物理学中的一个具体应用:使用心理物理学测量,通过矫正镜片有效地改善患者的视力。 摘要:Bayesian optimization (BO) is an efficient method to optimize expensive black-box functions. It has been generalized to scenarios where objective function evaluations return stochastic binary feedback, such as success/failure in a given test, or preference between different parameter settings. In many real-world situations, the objective function can be evaluated in controlled 'contexts' or 'environments' that directly influence the observations. For example, one could directly alter the 'difficulty' of the test that is used to evaluate a system's performance. With binary feedback, the context determines the information obtained from each observation. For example, if the test is too easy/hard, the system will always succeed/fail, yielding uninformative binary outputs. Here we combine ideas from Bayesian active learning and optimization to efficiently choose the best context and optimization parameter on each iteration. We demonstrate the performance of our algorithm and illustrate how it can be used to tackle a concrete application in visual psychophysics: efficiently improving patients' vision via corrective lenses, using psychophysics measurements.

【6】 Liu Estimator in the Multinomial Logistic Regression Model 标题:多项Logistic回归模型中的刘估计 链接:https://arxiv.org/abs/2111.03398

作者:Yasin Asar,Murat Erişoğlu 机构: Department of Mathematics and Computer Sciences, Necmettin Erbakan University, Corresponding Author, Department of Statistics, Necmettin Erbakan University, Konya, Turkey 摘要:本文考虑多项式logistic回归模型中的Liu估计。我们提出了一些不同的偏置参数估计。均方误差(MSE)被视为性能标准。为了比较估计器的性能,我们进行了蒙特卡罗模拟研究。根据模拟研究的结果,我们发现增加自变量和回归数之间的相关性对均方误差有负面影响。然而,当样本量增加时,即使自变量之间的相关性很大,MSE也会降低。基于最小均方误差准则,为实践者推荐了一些用于估计偏置参数d的有用估计器。 摘要:This paper considers the Liu estimator in the multinomial logistic regression model. We propose some different estimators of the biasing parameter. The mean square error (MSE) is considered as the performance criterion. In order to compare the performance of the estimators, we performed a Monte Carlo simulation study. According to the results of the simulation study, we found that increasing the correlation between the independent variables and the number of regressors has a negative effect on the MSE. However, when the sample size increases the MSE decreases even when the correlation between the independent variables is large. Based on the minimum MSE criterion some useful estimators for estimating the biasing parameter d are recommended for the practitioners.

【7】 On the relevance of prognostic information for clinical trials: A theoretical quantification 标题:论预后信息与临床试验的相关性:一种理论量化方法 链接:https://arxiv.org/abs/2111.03391

作者:Sandra Siegfried,Stephen Senn,Torsten Hothorn 机构:Universität Zürich, University of Sheffield 摘要:在数字化时代,如何利用队列研究或历史临床试验的个体患者数据设计更强大、更小但同样强大的临床试验变得越来越重要。今天,鉴于普遍存在的历史协变量信息,传统的统计分析方法对从业者来说似乎有问题。一些方法学发展旨在将历史信息纳入未来临床试验的设计和分析中,最重要的是贝叶斯信息借用、倾向评分法、分层和协变量调整。最近,有人建议根据从应用于历史数据的机器学习程序中获得的预后评分调整分析,我们研究了这种方法在随机临床试验中的潜力。在1:1分配的两臂试验中,在理想的正常结果情况下,我们推导了一个简单的样本量减少公式,作为两个标准的函数,表征预后得分:(1)历史数据的决定系数$R^2$,(2)估计和真实未知预后得分之间的相关性$ ho$。在保持相同功率的同时,未调整分析计划的原始总样本量$n$在调整分析中减少到$n$(1-R^2 ho^2)乘以n$。对不太理想情况下的稳健性进行了经验评估。我们的结论是,只有在能够准确估计预后评分的情况下,才有可能进行更有效或更小的试验。 摘要:The question of how individual patient data from cohort studies or historical clinical trials can be leveraged for designing more powerful, or smaller yet equally powerful, clinical trials becomes increasingly important in the era of digitalisation. Today, the traditional statistical analyses approaches may seem questionable to practitioners in light of ubiquitous historical covariate information. Several methodological developments aim at incorporating historical information in the design and analysis of future clinical trials, most importantly Bayesian information borrowing, propensity score methods, stratification, and covariate adjustment. Recently, adjusting the analysis with respect to a prognostic score, which was obtained from some machine learning procedure applied to historical data, has been suggested and we study the potential of this approach for randomised clinical trials. In an idealised situation of a normal outcome in a two-arm trial with 1:1 allocation, we derive a simple sample size reduction formula as a function of two criteria characterising the prognostic score: (1) The coefficient of determination $R^2$ on historical data and (2) the correlation $ ho$ between the estimated and the true unknown prognostic scores. While maintaining the same power, the original total sample size $n$ planned for the unadjusted analysis reduces to $(1 - R^2 ho^2) imes n$ in an adjusted analysis. Robustness in less ideal situations was assessed empirically. We conclude that there is potential for substantially more powerful or smaller trials, but only when prognostic scores can be accurately estimated.

【8】 Divide-and-Conquer Hard-thresholding Rules in High-dimensional Imbalanced Classification 标题:高维不平衡分类中的分治硬阈值规则 链接:https://arxiv.org/abs/2111.03306

作者:Arezou Mojiri,Abbas Khalili,Ali Zeinal Hamadani 机构:Department of Mathematical Sciences, Isfahan University of Technology, Department of Statistics and Mathematics, McGill University, Department of Industrial and Systems Engineering, Isfahan University of Technology 备注:62 pages, 4 figures, 8 Tables, 4 appendices A-D, 7 Sections 摘要:在二元分类中,不平衡指的是一个类被严重低估的情况。这个问题要么是由于数据收集过程,要么是因为一个类在人群中确实很少见。不平衡分类经常出现在生物学、医学、工程和社会科学等应用中。在这篇手稿中,我们首次从理论上研究了不平衡的班级规模对高维线性判别分析(LDA)的影响。我们发现,由于一个类别(称为少数类别)中的数据稀缺性以及特征空间的高维性,LDA忽略了少数类别,从而产生了最大的误分类率。然后,我们提出了一种新的基于分治技术的硬阈值规则构造方法,该方法减少了误分类率之间的巨大差异。我们证明了该方法是渐近最优的。我们进一步研究了两个著名的稀疏版本的LDA在不平衡的情况下。我们通过模拟和分析两个真实数据集来评估不同方法的有限样本性能。结果表明,我们的方法要么优于竞争对手,要么在所选特征子集小得多的情况下具有可比性能,同时计算效率更高。 摘要:In binary classification, imbalance refers to situations in which one class is heavily under-represented. This issue is due to either a data collection process or because one class is indeed rare in a population. Imbalanced classification frequently arises in applications such as biology, medicine, engineering, and social sciences. In this manuscript, for the first time, we theoretically study the impact of imbalance class sizes on the linear discriminant analysis (LDA) in high dimensions. We show that due to data scarcity in one class, referred to as the minority class, and high-dimensionality of the feature space, the LDA ignores the minority class yielding a maximum misclassification rate. We then propose a new construction of a hard-thresholding rule based on a divide-and-conquer technique that reduces the large difference between the misclassification rates. We show that the proposed method is asymptotically optimal. We further study two well-known sparse versions of the LDA in imbalanced cases. We evaluate the finite-sample performance of different methods using simulations and by analyzing two real data sets. The results show that our method either outperforms its competitors or has comparable performance based on a much smaller subset of selected features, while being computationally more efficient.

【9】 Optimality of variational inference for stochastic block model with missing links 标题:带缺失环节的随机挡路模型的变分推断的最优性 链接:https://arxiv.org/abs/2111.03305

作者:Solenne Gaucher,Olga Klopp 机构:Laboratoire de Math´ematiques d’Orsay, Universit´e Paris-Saclay, ESSEC Business School, CREST, ENSAE 摘要:变分方法在网络数据分析中非常流行。这些方法获得的统计保证通常为随机块模型下的全局模型参数估计问题提供渐近正态性。在目前的工作中,我们考虑的情况下,丢失链接的网络,在应用中是重要的,并表明最大似然估计的变分逼近收敛于极大极小率。这为带缺失环节的随机块模型的参数估计问题提供了第一个minimax最优可处理估计量。我们用模拟和真实网络的数值研究来补充我们的结果,这证实了这种估计器相对于当前方法的优势。 摘要:Variational methods are extremely popular in the analysis of network data. Statistical guarantees obtained for these methods typically provide asymptotic normality for the problem of estimation of global model parameters under the stochastic block model. In the present work, we consider the case of networks with missing links that is important in application and show that the variational approximation to the maximum likelihood estimator converges at the minimax rate. This provides the first minimax optimal and tractable estimator for the problem of parameter estimation for the stochastic block model with missing links. We complement our results with numerical studies of simulated and real networks, which confirm the advantages of this estimator over current methods.

【10】 Maillard Sampling: Boltzmann Exploration Done Optimally 标题:美拉德抽样:博尔兹曼勘探做得最好 链接:https://arxiv.org/abs/2111.03290

作者:Jie Bian,Kwang-Sung Jun 机构:University of Arizona 摘要:Maillard(2013)的博士论文提出了一种用于$K$武装强盗问题的随机算法。这种鲜为人知的算法,我们称之为美拉德抽样(MS),以封闭形式计算选择每个手臂的概率,这有助于根据bandit记录的数据进行反事实评估,但Thompson抽样(业界广泛采用的bandit算法)缺乏这种算法。基于这样的优点,我们重新审视了MS,并进行了改进分析,以表明它同时实现了渐近最优性和$sqrt{KTlog{T}}$极大极小遗憾界,其中,$T$是时间范围,与标准渐近最优UCB的性能相匹配。然后,我们提出了一种称为MS$^+$的MS变体,它改进了它在$sqrt{KTlog{K}}}$上的极小极大值,而不丢失渐近最优性。MS$^+$还可以调整为具有攻击性(即更少的探索),而不会失去理论保证,这是现有bandit算法无法提供的独特功能。我们的数值评估显示了MS$^+$的有效性。 摘要:The PhD thesis of Maillard (2013) presents a randomized algorithm for the $K$-armed bandit problem. This less-known algorithm, which we call Maillard sampling (MS), computes the probability of choosing each arm in a closed form, which is useful for counterfactual evaluation from bandit-logged data but was lacking from Thompson sampling, a widely-adopted bandit algorithm in the industry. Motivated by such merit, we revisit MS and perform an improved analysis to show that it achieves both the asymptotical optimality and $sqrt{KTlog{T}}$ minimax regret bound where $T$ is the time horizon, which matches the standard asymptotically optimal UCB's performance. We then propose a variant of MS called MS$^+$ that improves its minimax bound to $sqrt{KTlog{K}}$ without losing the asymptotic optimality. MS$^+$ can also be tuned to be aggressive (i.e., less exploration) without losing theoretical guarantees, a unique feature unavailable from existing bandit algorithms. Our numerical evaluation shows the effectiveness of MS$^+$.

【11】 Improved Regret Analysis for Variance-Adaptive Linear Bandits and Horizon-Free Linear Mixture MDPs 标题:改进的方差自适应线性带和无界线性混合MDP的遗憾分析 链接:https://arxiv.org/abs/2111.03289

作者:Yeoneung Kim,Insoon Yang,Kwang-Sung Jun 机构:Seoul National University, University of Arizona 摘要:在在线学习问题中,利用低方差在获得严格的性能保证方面发挥着重要作用,但由于方差通常是未知的,因此具有挑战性。最近,Zhang et al.(2021)取得了相当大的进展,他们在不知道方差的情况下获得了线性强盗的方差自适应后悔界和线性混合马尔可夫决策过程(MDP)的无地平线后悔界。在本文中,我们提出了新的分析,大大提高了他们的遗憾界限。对于线性土匪,我们实现$ ilde O(d^{1.5}sqrt{sum{k}^ksigma{k^2}+d^2)$,其中$d$是特征的维度,$k$是时间范围,$sigma_k^2$是时间步长$k$处的噪声方差,$ ilde O$忽略多段对数依赖性,这是$d^3$改善的因素。对于线性混合MDP,我们实现了$ ilde O(d^{1.5}sqrt{K}+d^3)$的无水平遗憾界,其中$d$是基本模型的数量,$K$是剧集的数量。这是一个前导项提高$d^3$的系数,低阶项提高$d^6$。我们的分析严格依赖于一个新的椭圆势“计数”引理。这个引理允许基于剥离的后悔分析,这可能是独立的兴趣。 摘要:In online learning problems, exploiting low variance plays an important role in obtaining tight performance guarantees yet is challenging because variances are often not known a priori. Recently, a considerable progress has been made by Zhang et al. (2021) where they obtain a variance-adaptive regret bound for linear bandits without knowledge of the variances and a horizon-free regret bound for linear mixture Markov decision processes (MDPs). In this paper, we present novel analyses that improve their regret bounds significantly. For linear bandits, we achieve $ ilde O(d^{1.5}sqrt{sum_{k}^K sigma_k^2} + d^2)$ where $d$ is the dimension of the features, $K$ is the time horizon, and $sigma_k^2$ is the noise variance at time step $k$, and $ ilde O$ ignores polylogarithmic dependence, which is a factor of $d^3$ improvement. For linear mixture MDPs, we achieve a horizon-free regret bound of $ ilde O(d^{1.5}sqrt{K} + d^3)$ where $d$ is the number of base models and $K$ is the number of episodes. This is a factor of $d^3$ improvement in the leading term and $d^6$ in the lower order term. Our analysis critically relies on a novel elliptical potential `count' lemma. This lemma allows a peeling-based regret analysis, which can be of independent interest.

【12】 Recurrent Neural Networks for Learning Long-term Temporal Dependencies with Reanalysis of Time Scale Representation 标题:基于时间尺度表示重分析的递归神经网络学习长期时间依赖性 链接:https://arxiv.org/abs/2111.03282

作者:Kentaro Ohno,Atsutoshi Kumagai 机构:NTT Computer and Data Science Laboratories 备注:8 pages, 5 figures, IEEE ICBK 2021 摘要:具有选通机制(如LSTM或GRU)的递归神经网络是建模序列数据的强大工具。在该机制中,为了控制RNN中处于隐藏状态的信息流而引入的遗忘门最近被重新解释为状态的时间尺度的代表,即RNN保留输入信息的时间长度的度量。在此解释的基础上,提出了几种参数初始化方法来利用数据中时间依赖性的先验知识,以提高可学习性。然而,解释依赖于各种不切实际的假设,例如在某个时间点之后没有输入。在这部作品中,我们在一个更现实的环境中重新考虑对遗忘之门的解释。首先,我们总结现有的门控RNN理论,以便我们可以考虑的情况下,连续输入。然后,我们认为,当相对于状态的损失梯度随着时间的推移呈指数下降时,遗忘门作为时间表示的解释是有效的。我们的经验证明,现有的RNN在几个任务的初始训练阶段满足这个梯度条件,这与以前的初始化方法是一致的。基于这一发现,我们提出了一种构建新的RNN的方法,它可以比传统模型代表更长的时间尺度,这将提高长期序列数据的可学习性。通过对真实数据集的实验验证了该方法的有效性。 摘要:Recurrent neural networks with a gating mechanism such as an LSTM or GRU are powerful tools to model sequential data. In the mechanism, a forget gate, which was introduced to control information flow in a hidden state in the RNN, has recently been re-interpreted as a representative of the time scale of the state, i.e., a measure how long the RNN retains information on inputs. On the basis of this interpretation, several parameter initialization methods to exploit prior knowledge on temporal dependencies in data have been proposed to improve learnability. However, the interpretation relies on various unrealistic assumptions, such as that there are no inputs after a certain time point. In this work, we reconsider this interpretation of the forget gate in a more realistic setting. We first generalize the existing theory on gated RNNs so that we can consider the case where inputs are successively given. We then argue that the interpretation of a forget gate as a temporal representation is valid when the gradient of loss with respect to the state decreases exponentially as time goes back. We empirically demonstrate that existing RNNs satisfy this gradient condition at the initial training phase on several tasks, which is in good agreement with previous initialization methods. On the basis of this finding, we propose an approach to construct new RNNs that can represent a longer time scale than conventional models, which will improve the learnability for long-term sequential data. We verify the effectiveness of our method by experiments with real-world datasets.

【13】 Local Asymptotic Normality and Optimal Estimation of low-rank Quantum Systems 标题:低秩量子系统的局部渐近正态性与最优估计 链接:https://arxiv.org/abs/2111.03279

作者:Samriddha Lahiry,Michael Nussbaum 机构:Department of Statistics and Data Science, Cornell University, Department of Mathematics, Cornell University 摘要:在经典统计学中,由d维多项式分布的$n$i.i.d观测值组成的统计实验可以很好地近似于$d-1$d维高斯分布。在结果的量子版本中,已经证明满秩的$n$量子集合可以很好地近似于包含经典部分的量子系统,经典部分是$d-1$维高斯分布,量子部分包含$d(d-1)/2$移位热态的集合。在本文中,我们得到了当量子数不是满秩时这个结果的一个推广。我们证明,当量子数的秩为$r$时,极限实验由$r-1$维高斯分布和移位纯态和移位热态的集合组成。我们还概述了低秩qudit估计的两阶段过程,其中我们得到了一个夏普极小极大最优估计。对于量子态的线性泛函的估计,我们构造了一个估计器,分析了风险,并用量子局域网络证明了我们的估计器在极大极小意义下也是最优的。 摘要:In classical statistics, a statistical experiment consisting of $n$ i.i.d observations from d-dimensional multinomial distributions can be well approximated by a $d-1$ dimensional Gaussian distribution. In a quantum version of the result it has been shown that a collection of $n$ qudits of full rank can be well approximated by a quantum system containing a classical part, which is a $d-1$ dimensional Gaussian distribution, and a quantum part containing an ensemble of $d(d-1)/2$ shifted thermal states. In this paper, we obtain a generalization of this result when the qudits are not of full rank. We show that when the rank of the qudits is $r$, then the limiting experiment consists of an $r-1$ dimensional Gaussian distribution and an ensemble of both shifted pure and shifted thermal states. We also outline a two-stage procedure for the estimation of the low-rank qudit, where we obtain an estimator which is sharp minimax optimal. For the estimation of a linear functional of the quantum state, we construct an estimator, analyze the risk and use quantum LAN to show that our estimator is also optimal in the minimax sense.

【14】 Test of Weak Separability for Spatially Stationary Functional Field 标题:空间定态函数场的弱可分性检验 链接:https://arxiv.org/abs/2111.03252

作者:Decai Liang,Hui Huang,Yongtao Guan,Fang Yao 机构: School of Statistics and Data Science, Nankai University, China, School of Mathematics, Sun Yat-sen University, China, Department of Management Science, University of Miami, USA 摘要:对于空间相关函数数据,通常使用广义Karhunen-Lo`e}ve展开将数据分解为时间分量和空间相关系数的相加形式。这种结构为研究时空相互作用提供了一个方便的模型,但可能不适用于复杂的时空过程。在这项工作中,我们引入了弱可分性的概念,并提出了一个形式化测试来检验其对非复制空间平稳函数场的有效性。通过构造滞后协方差估计,导出了适用于潜在发散秩的检验统计量的渐近分布,该估计易于计算,便于实际实现。我们通过模拟验证了所提出的测试的有效性,并在两个真实的例子中说明了它的有用性:中国PM${2.5}$数据和哈佛森林数据。 摘要:For spatially dependent functional data, a generalized Karhunen-Lo`{e}ve expansion is commonly used to decompose data into an additive form of temporal components and spatially correlated coefficients. This structure provides a convenient model to investigate the space-time interactions, but may not hold for complex spatio-temporal processes. In this work, we introduce the concept of weak separability, and propose a formal test to examine its validity for non-replicated spatially stationary functional field. The asymptotic distribution of the test statistic that adapts to potentially diverging ranks is derived by constructing lag covariance estimation, which is easy to compute for practical implementation. We demonstrate the efficacy of the proposed test via simulations and illustrate its usefulness in two real examples: China PM$_{2.5}$ data and Harvard Forest data.

【15】 Quantile index regression 标题:分位数指数回归 链接:https://arxiv.org/abs/2111.03223

作者:Yingying Zhang,Yuefeng Si,Guodong Li,Chil-Ling Tsai 机构:East China Normal University, bUniversity of Hong Kong, and cUniversity of California at Davis 摘要:估计高分位数或低分位数的结构已经成为一个重要的课题,并在许多领域引起了越来越多的关注。然而,由于尾部数据的稀疏性,获得可靠的估计通常是一项具有挑战性的任务,特别是对于高维数据。本文提出了一种灵活的尾部参数结构,这使我们能够利用丰富的观测值在分位数水平上进行估计,然后将拟合结构外推到远尾部。该模型依赖于一些分位数指数,因此称为分位数指数回归。此外,利用复合分位数回归方法得到了非交叉分位数估计量,并进一步建立了它们的理论性质,包括低维协变量情形的渐近正态性和高维协变量情形的非渐近误差界。仿真研究和一个实证例子说明了新模型的有效性。 摘要:Estimating the structures at high or low quantiles has become an important subject and attracted increasing attention across numerous fields. However, due to data sparsity at tails, it usually is a challenging task to obtain reliable estimation, especially for high-dimensional data. This paper suggests a flexible parametric structure to tails, and this enables us to conduct the estimation at quantile levels with rich observations and then to extrapolate the fitted structures to far tails. The proposed model depends on some quantile indices and hence is called the quantile index regression. Moreover, the composite quantile regression method is employed to obtain non-crossing quantile estimators, and this paper further establishes their theoretical properties, including asymptotic normality for the case with low-dimensional covariates and non-asymptotic error bounds for that with high-dimensional covariates. Simulation studies and an empirical example are presented to illustrate the usefulness of the new model.

【16】 Community detection in censored hypergraph 标题:删失超图中的社区检测 链接:https://arxiv.org/abs/2111.03179

作者:Mingao Yuan,Bin Zhao,Xiaofeng Zhao 机构:School of Mathematics and Statistics, North China University of Water Resources and Electric Power, China, e-mail: 摘要:社区检测是指将网络的节点(图或hypergrah)聚集到组中的问题。各种算法可用于社区检测,所有这些方法都适用于未经审查的网络。在实践中,网络可能存在截尾(或缺失)值,并且截尾值对网络的结构特性具有不可忽略的影响。本文从信息论的角度研究了删失$m$-一致超图中的群体检测问题。我们推导了准确恢复群落结构的信息论阈值。此外,我们还提出了一种多项式时间算法来精确地将社区结构恢复到阈值。该算法由一个谱算法和一个细化步骤组成。同样有趣的是,研究没有细化的单一光谱算法是否达到阈值。为此,我们还研究了半定松弛算法并分析了其性能。 摘要:Community detection refers to the problem of clustering the nodes of a network (either graph or hypergrah) into groups. Various algorithms are available for community detection and all these methods apply to uncensored networks. In practice, a network may has censored (or missing) values and it is shown that censored values have non-negligible effect on the structural properties of a network. In this paper, we study community detection in censored $m$-uniform hypergraph from information-theoretic point of view. We derive the information-theoretic threshold for exact recovery of the community structure. Besides, we propose a polynomial-time algorithm to exactly recover the community structure up to the threshold. The proposed algorithm consists of a spectral algorithm plus a refinement step. It is also interesting to study whether a single spectral algorithm without refinement achieves the threshold. To this end, we also explore the semi-definite relaxation algorithm and analyze its performance.

【17】 Optimal pooling and distributed inference for the tail index and extreme quantiles 标题:尾部指标和极值分位数的最优汇集和分布式推理 链接:https://arxiv.org/abs/2111.03173

作者:Abdelaati Daouia,Simone A. Padoan,Gilles Stupfler 机构:a Toulouse School of Economics, University of Toulouse Capitole, France, b Department of Decision Sciences, Bocconi University of Milan, via Roentgen , Milano, Italy, c Univ Rennes, Ensai, CNRS, CREST - UMR , F-, Rennes, France 摘要:本文研究了基于重尾数据的尾部指数和极值分位数估计的合并策略。为了充分利用多个样本中包含的信息,我们提出了尾指数的一般加权混合Hill估计和通过非标准几何平均方案计算的极端分位数的加权混合Weissman估计。我们在固定数量的样本上发展了他们的大样本渐近理论,涵盖了具有不同和渐近相关分布的异质样本大小的一般框架。我们的结果包括基于渐近方差和最小均方误差的最优组合权重选择。在分布推理的重要应用中,我们证明了基于子样本不可行组合的方差最优分布估计与基准Hill和Weissman估计渐近等价,而在大偏差情况下,AMSE最优分布估计的AMSE小于基准估计。我们考虑额外的场景,其中子样本的数量随着总样本大小而增长,并且有效子样本大小可以是低的。我们扩展了我们的方法来处理序列依赖性和协变量的存在。仿真结果证实,我们的合并估计量的性能几乎与基准估计量一样好。两个应用程序的真实天气和保险数据显示。 摘要:This paper investigates pooling strategies for tail index and extreme quantile estimation from heavy-tailed data. To fully exploit the information contained in several samples, we present general weighted pooled Hill estimators of the tail index and weighted pooled Weissman estimators of extreme quantiles calculated through a nonstandard geometric averaging scheme. We develop their large-sample asymptotic theory across a fixed number of samples, covering the general framework of heterogeneous sample sizes with different and asymptotically dependent distributions. Our results include optimal choices of pooling weights based on asymptotic variance and MSE minimization. In the important application of distributed inference, we prove that the variance-optimal distributed estimators are asymptotically equivalent to the benchmark Hill and Weissman estimators based on the unfeasible combination of subsamples, while the AMSE-optimal distributed estimators enjoy a smaller AMSE than the benchmarks in the case of large bias. We consider additional scenarios where the number of subsamples grows with the total sample size and effective subsample sizes can be low. We extend our methodology to handle serial dependence and the presence of covariates. Simulations confirm that our pooled estimators perform virtually as well as the benchmark estimators. Two applications to real weather and insurance data are showcased.

【18】 Multi-Objective Constrained Optimization for Energy Applications via Tree Ensembles 标题:基于树集合法的能量应用多目标约束优化 链接:https://arxiv.org/abs/2111.03140

作者:Alexander Thebelt,Calvin Tsay,Robert M. Lee,Nathan Sudermann-Merx,David Walz,Tom Tranter,Ruth Misener 机构:Imperial College London, South Kensington, SW,AZ, UK., BASF SE, Ludwigshafen am Rhein, Germany., Cooperative State University Mannheim, Mannheim, Germany., nathan-georg.sudermann-merx, Electrochemical Innovation Lab, University College London 备注:36 pages, 8 figures, 5 tables 摘要:由于强烈的非线性系统行为和多个竞争目标,例如经济收益与环境影响,能源系统优化问题非常复杂。此外,大量的输入变量和不同的变量类型(如连续变量和分类变量)是现实应用中常见的挑战。在某些情况下,建议的最佳解决方案需要遵守与物理特性或安全关键操作条件相关的显式输入约束。本文提出了一种新的基于树集合的数据驱动策略,用于具有异构变量空间的黑箱问题的约束多目标优化,其基本系统动力学要么过于复杂,无法建模,要么未知。在一个由合成基准和相关能源应用组成的广泛案例研究中,我们展示了与其他最先进工具相比,所提出算法的竞争性能和采样效率,使其成为现实世界中评估预算有限的应用的一个有用的一体化解决方案。 摘要:Energy systems optimization problems are complex due to strongly non-linear system behavior and multiple competing objectives, e.g. economic gain vs. environmental impact. Moreover, a large number of input variables and different variable types, e.g. continuous and categorical, are challenges commonly present in real-world applications. In some cases, proposed optimal solutions need to obey explicit input constraints related to physical properties or safety-critical operating conditions. This paper proposes a novel data-driven strategy using tree ensembles for constrained multi-objective optimization of black-box problems with heterogeneous variable spaces for which underlying system dynamics are either too complex to model or unknown. In an extensive case study comprised of synthetic benchmarks and relevant energy applications we demonstrate the competitive performance and sampling efficiency of the proposed algorithm compared to other state-of-the-art tools, making it a useful all-in-one solution for real-world applications with limited evaluation budgets.

【19】 Nonparametric Regression and Classification with Functional, Categorical, and Mixed Covariates 标题:函数、分类和混合协变量的非参数回归与分类 链接:https://arxiv.org/abs/2111.03115

作者:Leonie Selk,Jan Gertheiss 摘要:我们考虑非参数预测与多个协变量,特别是分类或功能预测,或两者的混合物。所提出的方法基于Nadaraya Watson估计量的扩展,其中核函数应用于距离度量的线性组合,每个距离度量在单个协变量上计算,权重从训练数据中估计。因变量可以是分类的(二进制的或多类的)或连续的,因此我们考虑分类和回归问题。所提出的方法在人工和真实数据上进行了说明和评估。特别值得注意的是,可以通过以完全数据驱动的方式“降级”相应的距离度量来提高预测精度,并识别/消除不相关的噪声变量。 摘要:We consider nonparametric prediction with multiple covariates, in particular categorical or functional predictors, or a mixture of both. The method proposed bases on an extension of the Nadaraya-Watson estimator where a kernel function is applied on a linear combination of distance measures each calculated on single covariates, with weights being estimated from the training data. The dependent variable can be categorical (binary or multi-class) or continuous, thus we consider both classification and regression problems. The methodology presented is illustrated and evaluated on artificial and real world data. Particularly it is observed that prediction accuracy can be increased, and irrelevant, noise variables can be identified/removed by "downgrading" the corresponding distance measures in a completely data-driven way.

【20】 On provision of UK neighbourhood population statistics beyond 2021 标题:关于提供2021年后英国邻里人口统计数据的建议 链接:https://arxiv.org/abs/2111.03100

作者:Li-Chun Zhang 摘要:2021年人口普查很可能是英国最后一次此类普查。为了在2021年之后的最近几年提供人口统计数据,目前设想的基本方案是通过持续覆盖率调查补充现有的行政数据,这相当于每年约50万个地址的样本量,尽管方法的细节尚待确定。与此同时,国家统计局正在寻找替代方法,以便更多地利用相关管理数据。本报告概述了2021年后提供英国邻里人口统计数据的滚动方法的基本思想,该方法着眼于为官方统计系统建立可持续的未来这一广阔前景,其响应速度更快,细节更丰富,回报更高,长期成本效益更高。 摘要:Census 2021 may well be the last of its kind in the UK. For provision of population statistics in the immediate years following 2021, the basic scheme currently envisaged is to supplement available administrate data with a continuous coverage survey, which amounts to a yearly sample size of about 0.5 million addresses, although the details of the methodology are yet to be determined. Meanwhile, the ONS is seeking alternative approaches, which can make greater use of the relevant administrative data. This report outlines the basic ideas of a rolling approach for provision of UK neighbourhood population statistics beyond 2021, set in the broad perspective of establishing a sustainable future for official statistical systems, which is faster in response, richer in detail and greater in return of long-term cost efficiency.

【21】 Exploiting a Zoo of Checkpoints for Unseen Tasks 标题:利用检查点动物园执行看不见的任务 链接:https://arxiv.org/abs/2111.03628

作者:Jiaji Huang,Qiang Qiu,Kenneth Church 机构:Baidu Research, Sunnyvale, CA, School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 备注:Accepted in Neurips 2021 摘要:文献中有如此多的模型,实践者很难决定哪些组合可能对新任务有效。本文试图通过捕获web上发布的检查点之间的关系来解决这个问题。我们将任务空间建模为高斯过程。协方差可以通过检查点和未标记的探测数据来估计。在高斯过程中,我们可以通过最大互信息准则来识别具有代表性的检查点。这个目标是子模块化的。贪婪方法识别可能“覆盖”任务空间的代表。这些代表概括了具有优异性能的新任务。为计算语言学和计算机视觉的应用提供了经验证据。 摘要:There are so many models in the literature that it is difficult for practitioners to decide which combinations are likely to be effective for a new task. This paper attempts to address this question by capturing relationships among checkpoints published on the web. We model the space of tasks as a Gaussian process. The covariance can be estimated from checkpoints and unlabeled probing data. With the Gaussian process, we can identify representative checkpoints by a maximum mutual information criterion. This objective is submodular. A greedy method identifies representatives that are likely to "cover" the task space. These representatives generalize to new tasks with superior performance. Empirical evidence is provided for applications from both computational linguistics as well as computer vision.

【22】 NAS-Bench-x11 and the Power of Learning Curves 标题:NAS-BENCH-X11和学习曲线的力量 链接:https://arxiv.org/abs/2111.03602

作者:Shen Yan,Colin White,Yash Savani,Frank Hutter 机构: Michigan State University, Abacus.AI, Carnegie Mellon University, University of Freiburg, Bosch Center for Artificial Intelligence 备注:NeurIPS 2021 摘要:虽然神经架构搜索(NAS)的早期研究需要极端的计算资源,但最近发布的表格和替代基准大大提高了NAS研究的速度和再现性。然而,两个最流行的基准并没有为每个体系结构提供完整的训练信息。因此,在这些基准上,不可能运行许多类型的多保真度技术,例如需要在任意时期评估体系结构的学习曲线外推。在这项工作中,我们提出了一种使用奇异值分解和噪声建模的方法,以创建替代基准NAS-Bench-111、NAS-Bench-311和NAS-Bench-NLP11,它们输出每个体系结构的完整训练信息,而不仅仅是最终的验证精度。我们通过引入一个学习曲线外推框架来修改单保真度算法,展示了使用完整训练信息的能力,表明它比流行的单保真度算法有所改进,后者在发布时声称是最先进的。我们的代码和预训练模型可在https://github.com/automl/nas-bench-x11. 摘要:While early research in neural architecture search (NAS) required extreme computational resources, the recent releases of tabular and surrogate benchmarks have greatly increased the speed and reproducibility of NAS research. However, two of the most popular benchmarks do not provide the full training information for each architecture. As a result, on these benchmarks it is not possible to run many types of multi-fidelity techniques, such as learning curve extrapolation, that require evaluating architectures at arbitrary epochs. In this work, we present a method using singular value decomposition and noise modeling to create surrogate benchmarks, NAS-Bench-111, NAS-Bench-311, and NAS-Bench-NLP11, that output the full training information for each architecture, rather than just the final validation accuracy. We demonstrate the power of using the full training information by introducing a learning curve extrapolation framework to modify single-fidelity algorithms, showing that it leads to improvements over popular single-fidelity algorithms which claimed to be state-of-the-art upon release. Our code and pretrained models are available at https://github.com/automl/nas-bench-x11.

【23】 Hybrid Spectrogram and Waveform Source Separation 标题:混合谱图与波形源分离 链接:https://arxiv.org/abs/2111.03600

作者:Alexandre Défossez 机构: Facebook AI Research, Authors of papers retain, copyright and release the work, under a Creative Commons, Attribution ,., International, License (CC BY ,.,)., In partnership with 备注:ISMIR 2021 MDX Workshop, 11 pages, 2 figures 摘要:源分离模型要么在频谱图上工作,要么在波形域上工作。在这项工作中,我们展示了如何执行端到端混合源分离,让模型决定哪个域最适合每个源,甚至将两者结合起来。Demucs架构的拟议混合版本赢得了索尼组织的2021年音乐分频挑战赛。该体系结构还附带了其他改进,例如压缩剩余分支、局部注意或奇异值正则化。总体而言,在MusDB HQ数据集上测量的所有源中观察到1.4 dB的信号失真比(SDR)改善,这一改善经人类主观评估确认,总体质量评级为5分之2.83(非混合DEMC为2.36),无污染评级为3.04(非混合动力DEMCS为2.37,在竞赛中提交的排名第二的车型为2.44)。 摘要:Source separation models either work on the spectrogram or waveform domain. In this work, we show how to perform end-to-end hybrid source separation, letting the model decide which domain is best suited for each source, and even combining both. The proposed hybrid version of the Demucs architecture won the Music Demixing Challenge 2021 organized by Sony. This architecture also comes with additional improvements, such as compressed residual branches, local attention or singular value regularization. Overall, a 1.4 dB improvement of the Signal-To-Distortion (SDR) was observed across all sources as measured on the MusDB HQ dataset, an improvement confirmed by human subjective evaluation, with an overall quality rated at 2.83 out of 5 (2.36 for the non hybrid Demucs), and absence of contamination at 3.04 (against 2.37 for the non hybrid Demucs and 2.44 for the second ranking model submitted at the competition).

【24】 Mixtures of Laplace Approximations for Improved Post-Hoc Uncertainty in Deep Learning 标题:用于改进深度学习中后自组织不确定性的混合拉普拉斯近似 链接:https://arxiv.org/abs/2111.03577

作者:Runa Eschenhagen,Erik Daxberger,Philipp Hennig,Agustinus Kristiadi 机构:University of Tübingen, University of Cambridge, MPI for Intelligent Systems, Tübingen 备注:Bayesian Deep Learning Workshop, NeurIPS 2021 摘要:深度神经网络容易对异常值进行过度自信的预测。贝叶斯神经网络和深度集成都在一定程度上缓解了这一问题。在这项工作中,我们的目标是结合这两种方法的优点,提出使用高斯混合模型进行预测,该模型由Lapl的加权和组成独立训练的深度神经网络的ace近似。该方法可用于任何一组预先训练的网络,与常规集合相比,只需要较小的计算和内存开销。我们从理论上验证了我们的方法缓解了“遥远”的过度自信根据训练数据,并与标准不确定度量化基准的最新基线进行经验比较。 摘要:Deep neural networks are prone to overconfident predictions on outliers. Bayesian neural networks and deep ensembles have both been shown to mitigate this problem to some extent. In this work, we aim to combine the benefits of the two approaches by proposing to predict with a Gaussian mixture model posterior that consists of a weighted sum of Laplace approximations of independently trained deep neural networks. The method can be used post hoc with any set of pre-trained networks and only requires a small computational and memory overhead compared to regular ensembles. We theoretically validate that our approach mitigates overconfidence "far away" from the training data and empirically compare against state-of-the-art baselines on standard uncertainty quantification benchmarks.

【25】 An Empirical Study of Neural Kernel Bandits 标题:神经核带的实证研究 链接:https://arxiv.org/abs/2111.03543

作者:Michal Lisicki,Arash Afkanpour,Graham W. Taylor 机构:University of Guelph, Vector Institute for AI, Google 备注:Presented at Workshop on Bayesian Deep Learning at NeurIPS 2021 摘要:神经强盗使从业者能够有效地处理具有非线性奖励函数的问题。虽然一般情况下,上下文盗贼通常使用高斯过程(GP)预测分布进行决策,但最成功的神经变体仅使用推导中的最后一层参数。神经核研究(NK)最近在深度网络和GPs之间建立了一种对应关系,它考虑了神经网络的所有参数,并且比大多数贝叶斯神经网络训练更有效。我们建议直接应用NK诱导分布来指导基于置信上限或汤普森抽样的策略。我们证明了NK bandits在高度非线性的结构化数据上实现了最先进的性能。此外,我们还分析了训练频率和模型划分等实际因素。我们相信,我们的工作将有助于更好地理解在应用环境中使用NK的影响。 摘要:Neural bandits have enabled practitioners to operate efficiently on problems with non-linear reward functions. While in general contextual bandits commonly utilize Gaussian process (GP) predictive distributions for decision making, the most successful neural variants use only the last layer parameters in the derivation. Research on neural kernels (NK) has recently established a correspondence between deep networks and GPs that take into account all the parameters of a NN and can be trained more efficiently than most Bayesian NNs. We propose to directly apply NK-induced distributions to guide an upper confidence bound or Thompson sampling-based policy. We show that NK bandits achieve state-of-the-art performance on highly non-linear structured data. Furthermore, we analyze practical considerations such as training frequency and model partitioning. We believe our work will help better understand the impact of utilizing NKs in applied settings.

【26】 S-multi-SNE: Semi-Supervised Classification and Visualisation of Multi-View Data 标题:S-MULTI-SNE:多视图数据的半监督分类与可视化 链接:https://arxiv.org/abs/2111.03519

作者:Theodoulos Rodosthenous,Vahid Shahrezaei,Marina Evangelou 机构:Department of Mathematics, Imperial College London, London, SW,AZ, UK 备注:13 pages; 3 figures; 3 tables 摘要:多个领域的研究正在发布越来越多的多视图数据。这种类型的数据对应于多个数据视图,每个视图表示同一组样本的不同方面。我们最近提出了multi-SNE,这是t-SNE的一个扩展,它可以生成多视图数据的单一可视化。multi-SNE方法提供了样本的低维嵌入,这些样本是通过不同的数据视图进行迭代更新而生成的。在这里,我们进一步将multi-SNE扩展到一种半监督方法,该方法通过将标记信息视为额外的数据视图来对未标记样本进行分类。通过将这两种方法应用于具有不同挑战的各种多视图数据集,我们深入研究了multi-SNE及其扩展S-multi-SNE的性能、局限性和优势。我们发现,通过包含标签信息,样本的投影得到了极大的改善,并伴随着强大的分类性能。 摘要:An increasing number of multi-view data are being published by studies in several fields. This type of data corresponds to multiple data-views, each representing a different aspect of the same set of samples. We have recently proposed multi-SNE, an extension of t-SNE, that produces a single visualisation of multi-view data. The multi-SNE approach provides low-dimensional embeddings of the samples, produced by being updated iteratively through the different data-views. Here, we further extend multi-SNE to a semi-supervised approach, that classifies unlabelled samples by regarding the labelling information as an extra data-view. We look deeper into the performance, limitations and strengths of multi-SNE and its extension, S-multi-SNE, by applying the two methods on various multi-view datasets with different challenges. We show that by including the labelling information, the projection of the samples improves drastically and it is accompanied by a strong classification performance.

【27】 Data-driven Hedging of Stock Index Options via Deep Learning 标题:基于深度学习的股指期权数据驱动套期保值 链接:https://arxiv.org/abs/2111.03477

作者:Jie Chen,Lingfei Li 摘要:我们开发了深度学习模型,直接从期权数据中学习标准普尔500指数期权的套期保值比率。我们比较了不同的特征组合,结果表明,在样本外测试中,以成熟时间、Black-Scholes delta和情绪变量(看涨期权的VIX和看跌期权的指数回报)作为输入特征的前馈神经网络模型表现最好。该模型显著优于使用Black-Scholes delta和最新数据驱动模型的标准套期保值实践。我们的结果证明了市场情绪对套期保值效率的重要性,这是以前在制定套期保值策略时被忽视的一个因素。 摘要:We develop deep learning models to learn the hedge ratio for S&P500 index options directly from options data. We compare different combinations of features and show that a feedforward neural network model with time to maturity, Black-Scholes delta and a sentiment variable (VIX for calls and index return for puts) as input features performs the best in the out-of-sample test. This model significantly outperforms the standard hedging practice that uses the Black-Scholes delta and a recent data-driven model. Our results demonstrate the importance of market sentiment for hedging efficiency, a factor previously ignored in developing hedging strategies.

【28】 Conformer-based Hybrid ASR System for Switchboard Dataset 标题:基于一致性的交换机数据集混合ASR系统 链接:https://arxiv.org/abs/2111.03442

作者:Mohammad Zeineldeen,Jingjing Xu,Christoph Lüscher,Wilfried Michel,Alexander Gerstenberger,Ralf Schlüter,Hermann Ney 机构:Human Language Technology and Pattern Recognition, Computer Science Department, RWTH Aachen University, Aachen, Germany, AppTek GmbH, Aachen, Germany 备注:Submitted to ICASSP 2022 摘要:最近提出的conformer体系结构已成功用于端到端自动语音识别(ASR)体系结构,在不同数据集上实现了最先进的性能。据我们所知,没有研究混合ASR使用一致性声学模型的影响。在本文中,我们提出并评估了一种基于竞争一致性的混合模型训练方法。我们研究了不同的训练方面和方法来提高单词错误率和提高训练速度。我们采用时间降采样方法进行有效训练,并使用转置卷积再次对输出序列进行上采样。我们在交换机300h数据集上进行了实验,与其他架构相比,我们基于一致性的混合模型取得了具有竞争力的结果。它在Hub5'01测试集上具有很好的通用性,显著优于基于BLSTM的混合模型。 摘要:The recently proposed conformer architecture has been successfully used for end-to-end automatic speech recognition (ASR) architectures achieving state-of-the-art performance on different datasets. To our best knowledge, the impact of using conformer acoustic model for hybrid ASR is not investigated. In this paper, we present and evaluate a competitive conformer-based hybrid model training recipe. We study different training aspects and methods to improve word-error-rate as well as to increase training speed. We apply time downsampling methods for efficient training and use transposed convolutions to upsample the output sequence again. We conduct experiments on Switchboard 300h dataset and our conformer-based hybrid model achieves competitive results compared to other architectures. It generalizes very well on Hub5'01 test set and outperforms the BLSTM-based hybrid model significantly.

【29】 Time to critical condition in emergency services 标题:紧急服务到达危急状态的时间 链接:https://arxiv.org/abs/2111.03440

作者:Pedro A. Pury 机构:Facultad de Matem´atica, Astronom´ıa, F´ısica y Computaci´on, Universidad Nacional de C´ordoba, Ciudad Universitaria, X,HUA C´ordoba, Argentina 备注:None 摘要:无论运行情况如何,提供不间断响应服务对于紧急医疗服务至关重要。因此,可靠地估计到达临界状态的时间(在这种情况下,将没有可用的服务器来响应下一个传入呼叫)成为非常有用的系统性能度量。在本文中,我们通过提供到短缺状态的平均时间的显式公式,开发了一个关键绩效指标。我们对该平均时间的分析表达式是并行服务器数量以及到达时间和服务时间之间的函数。在我们的分析表达式中,我们假设时间呈指数分布,但为了在更现实的情况下评估临界条件下的平均首次通过时间,我们通过对数正态服务时间分布的详尽模拟来验证我们的结果。对于这个任务,我们用$R$实现了一个模拟器。我们的结果表明,在任何实际情况下,我们的分析公式都是可接受的近似值。 摘要:Providing uninterrupted response service is of paramount importance for emergency medical services, regardless of the operating scenario. Thus, reliable estimates of the time to the critical condition, under which there will be no available servers to respond the next incoming call, become very useful measures of the system's performance. In this contribution, we develop a key performance indicator by providing an explicit formula for the average time to the shortage condition. Our analytical expression for this average time is a function of the number of parallel servers and the inter-arrival and service times. We assume exponential distributions of times in our analytical expression but for evaluating the mean first-passage time to the critical condition under more realistic scenarios we validate our result through exhaustive simulations with lognormal service time distributions. For this task we have implemented an simulator in $R$. Our results indicate that our analytical formula is an acceptable approximation under any situation of practical interest.

【30】 Meta-Forecasting by combining Global DeepRepresentations with Local Adaptation 标题:全局深度表示与局部适应相结合的元预测 链接:https://arxiv.org/abs/2111.03418

作者:Riccardo Grazzi,Valentin Flunkert,David Salinas,Tim Januschowski,Matthias Seeger,Cedric Archambeau 机构:IIT and UCL, Amazon Web Services 摘要:虽然经典的时间序列预测是孤立地考虑单个时间序列,但基于深度学习的最新进展表明,从大量相关时间序列中联合学习可以提高预测精度。然而,当对样本外时间序列进行建模时,这些方法的准确性会受到很大影响,与经典预测方法相比,它们的适用性受到很大限制。为了弥补这一差距,我们采用了时间序列预测问题的元学习观点。我们介绍了一种新的预测方法,称为元全局局部自回归(Meta-GLAR),该方法通过以封闭形式学习从递归神经网络(RNN)产生的表示到一步超前预测的映射来适应每个时间序列。关键的是,RNN的参数是通过封闭形式的自适应机制通过反向传播跨多个时间序列学习的。在我们广泛的实证评估中,我们表明,我们的方法在样本外预测精度方面与早期工作中报告的最新水平具有竞争力。 摘要:While classical time series forecasting considers individual time series in isolation, recent advances based on deep learning showed that jointly learning from a large pool of related time series can boost the forecasting accuracy. However, the accuracy of these methods suffers greatly when modeling out-of-sample time series, significantly limiting their applicability compared to classical forecasting methods. To bridge this gap, we adopt a meta-learning view of the time series forecasting problem. We introduce a novel forecasting method, called Meta Global-Local Auto-Regression (Meta-GLAR), that adapts to each time series by learning in closed-form the mapping from the representations produced by a recurrent neural network (RNN) to one-step-ahead forecasts. Crucially, the parameters ofthe RNN are learned across multiple time series by backpropagating through the closed-form adaptation mechanism. In our extensive empirical evaluation we show that our method is competitive with the state-of-the-art in out-of-sample forecasting accuracy reported in earlier work.

【31】 Dual Parameterization of Sparse Variational Gaussian Processes 标题:稀疏变分高斯过程的对偶参数化 链接:https://arxiv.org/abs/2111.03412

作者:Vincent Adam,Paul E. Chang,Mohammad Emtiyaz Khan,Arno Solin 机构:Aalto University Secondmind.ai, Espoo, Finland Cambridge, UK, RIKEN Center for AI Project, Tokyo, Japan 备注:To appear in Advances in Neural Information Processing Systems (NeurIPS 2021) 摘要:稀疏变分高斯过程(SVGP)方法是非共轭高斯过程推理的常用方法,因为它们具有计算优势。在本文中,我们通过使用双重参数化来提高它们的计算效率,其中每个数据示例被分配双重参数,类似于期望传播中使用的站点参数。我们的双重参数化使用自然梯度下降加速推理,并为超参数学习提供更严格的证据下限。该方法与当前的SVGP方法具有相同的内存开销,但它更快、更准确。 摘要:Sparse variational Gaussian process (SVGP) methods are a common choice for non-conjugate Gaussian process inference because of their computational benefits. In this paper, we improve their computational efficiency by using a dual parameterization where each data example is assigned dual parameters, similarly to site parameters used in expectation propagation. Our dual parameterization speeds-up inference using natural gradient descent, and provides a tighter evidence lower bound for hyperparameter learning. The approach has the same memory cost as the current SVGP methods, but it is faster and more accurate.

【32】 Long Range Probabilistic Forecasting in Time-Series using High Order Statistics 标题:基于高阶统计量的时间序列长期概率预测 链接:https://arxiv.org/abs/2111.03394

作者:Prathamesh Deshpande,Sunita Sarawagi 机构:IIT Bombay 备注:9 pages, 2 figures, 3 tables, 1 algorithm 摘要:长期预测是许多决策支持系统的起点,这些系统需要从预测值的高级聚合模式中进行推理。最先进的时间序列预测方法要么在长期预测中受到概念漂移的影响,要么无法准确预测连贯和准确的高层总量。在这项工作中,我们提出了一种新的概率预测方法,该方法产生的预测在基本水平和预测的聚合统计方面是一致的。我们使用一种新的推理方法实现了预测基准水平和聚合统计数据之间的一致性。我们的推理方法基于KL散度,可以有效地以闭合形式求解。我们表明,我们的方法提高了在三个不同领域的真实数据集上进行推理后的基础级和不可见聚合的预测性能。 摘要:Long range forecasts are the starting point of many decision support systems that need to draw inference from high-level aggregate patterns on forecasted values. State of the art time-series forecasting methods are either subject to concept drift on long-horizon forecasts, or fail to accurately predict coherent and accurate high-level aggregates. In this work, we present a novel probabilistic forecasting method that produces forecasts that are coherent in terms of base level and predicted aggregate statistics. We achieve the coherency between predicted base-level and aggregate statistics using a novel inference method. Our inference method is based on KL-divergence and can be solved efficiently in closed form. We show that our method improves forecast performance across both base level and unseen aggregates post inference on real datasets ranging three diverse domains.

【33】 FINN.no Slates Dataset: A new Sequential Dataset Logging Interactions, allViewed Items and Click Responses/No-Click for Recommender Systems Research 标题:Finn.no Slate DataSet:一种新的顺序数据集,记录交互、所有查看的项目和点击响应/无点击以进行推荐系统研究 链接:https://arxiv.org/abs/2111.03340

作者:Simen Eide,Arnoldo Frigessi,Helge Jenssen,David S. Leslie,Joakim Rishaug,Sofie Verrewaere 机构:CCS Concepts: • Information systems → Personalization., Additional Key Words and Phrases: slate recommendations, search result, candidate sampling, marketplace data, reinforcement, learning, bandit, item attributes, off-policy, ACM Reference Format: 备注:5 pages, Fifteen ACM Conference on Recommender Systems (recsys21), 2021, Amsterdam, Netherlands 摘要:我们提出了一个新的推荐系统数据集,记录用户和在线市场之间的顺序交互。用户将依次获得来自市场的推荐和搜索结果,这些推荐和搜索结果以项目列表的形式排列,称为slates。数据集包括在每一轮中显示的板岩,用户是否单击了这些项目中的任何一个,以及用户单击了哪些项目。尽管推荐系统中暴露数据的使用正在增长,但据我们所知,没有一个开放的大规模推荐系统数据集包含在每次交互中呈现给用户的项目列表。因此,大多数关于推荐系统的文章都没有利用这种曝光信息。相反,所提出的模型只依赖于用户的点击响应,并假设用户在每一步都暴露于项目范围中的所有项目,通常称为统一候选抽样。这是一个不完整的假设,因为它考虑了用户可能没有接触过的项目。这样,用户可能会错误地认为项目不感兴趣。考虑到实际显示的板岩,模型可以使用更自然的可能性,基于给定项目暴露集的点击概率,这在bandit和强化学习文献中很普遍引用{Eide2021DynamicSampling}表明基于统一候选抽样(和类似假设)的可能性隐含地假设平台仅向用户显示最相关的项目。这会导致推荐系统隐式地加强反馈循环,并偏向于先前向用户公开的项目。 摘要:We present a novel recommender systems dataset that records the sequential interactions between users and an online marketplace. The users are sequentially presented with both recommendations and search results in the form of ranked lists of items, called slates, from the marketplace. The dataset includes the presented slates at each round, whether the user clicked on any of these items and which item the user clicked on. Although the usage of exposure data in recommender systems is growing, to our knowledge there is no open large-scale recommender systems dataset that includes the slates of items presented to the users at each interaction. As a result, most articles on recommender systems do not utilize this exposure information. Instead, the proposed models only depend on the user's click responses, and assume that the user is exposed to all the items in the item universe at each step, often called uniform candidate sampling. This is an incomplete assumption, as it takes into account items the user might not have been exposed to. This way items might be incorrectly considered as not of interest to the user. Taking into account the actually shown slates allows the models to use a more natural likelihood, based on the click probability given the exposure set of items, as is prevalent in the bandit and reinforcement learning literature. cite{Eide2021DynamicSampling} shows that likelihoods based on uniform candidate sampling (and similar assumptions) are implicitly assuming that the platform only shows the most relevant items to the user. This causes the recommender system to implicitly reinforce feedback loops and to be biased towards previously exposed items to the user.

【34】 Learning on Random Balls is Sufficient for Estimating (Some) Graph Parameters 标题:随机球上的学习足以估计(某些)图参数 链接:https://arxiv.org/abs/2111.03317

作者:Takanori Maehara,Hoang NT 机构:Facebook AI, London, United Kingdom, Tokyo Tech & RIKEN AIP, Tokyo, Japan 备注:The manuscript is accepted as a poster presentation at NeurIPS 2021. This ArXiv version includes the Appendix 摘要:对图学习方法的理论分析通常假设对输入图进行完全观察。由于实践中的可伸缩性问题,这样的假设对于处理任何大小的图可能都没有用处。在这项工作中,我们发展了一个在部分观测环境(即子图抽样)下的图分类问题的理论框架。借助于图极限理论,我们提出了一种新的图分类模型,该模型适用于随机抽样的子图,并提出了一种新的拓扑结构来表征该模型的可表示性。我们的理论框架有助于对图的小批量学习进行理论验证,并在不假设输入的情况下,在泛化界和大小泛化性方面得到新的学习理论结果。 摘要:Theoretical analyses for graph learning methods often assume a complete observation of the input graph. Such an assumption might not be useful for handling any-size graphs due to the scalability issues in practice. In this work, we develop a theoretical framework for graph classification problems in the partial observation setting (i.e., subgraph samplings). Equipped with insights from graph limit theory, we propose a new graph classification model that works on a randomly sampled subgraph and a novel topology to characterize the representability of the model. Our theoretical framework contributes a theoretical validation of mini-batch learning on graphs and leads to new learning-theoretic results on generalization bounds as well as size-generalizability without assumptions on the input.

【35】 Distilling Heterogeneity: From Explanations of Heterogeneous Treatment Effect Models to Interpretable Policies 标题:提炼异质性:从异质治疗效果模型的解释到可解释的政策 链接:https://arxiv.org/abs/2111.03267

作者:Han Wu,Sarah Tan,Weiwei Li,Mia Garrard,Adam Obeng,Drew Dimmery,Shaun Singh,Hanson Wang,Daniel Jiang,Eytan Bakshy 备注:A short version was presented at MIT CODE 2021 摘要:互联网公司越来越多地使用机器学习模型来创建个性化的策略,为每个人分配最佳预测治疗。它们通常来自预测个体水平治疗效果的黑盒异质治疗效果(HTE)模型。在本文中,我们主要关注(1)HTE模型的学习解释;(2) 学习规定治疗任务的可解释政策。我们还提出了指导树,这是一种在不丢失可解释性的情况下集成多个可解释策略的方法。这些基于规则的可解释策略易于部署,并且无需在生产环境中维护HTE模型。 摘要:Internet companies are increasingly using machine learning models to create personalized policies which assign, for each individual, the best predicted treatment for that individual. They are frequently derived from black-box heterogeneous treatment effect (HTE) models that predict individual-level treatment effects. In this paper, we focus on (1) learning explanations for HTE models; (2) learning interpretable policies that prescribe treatment assignments. We also propose guidance trees, an approach to ensemble multiple interpretable policies without the loss of interpretability. These rule-based interpretable policies are easy to deploy and avoid the need to maintain a HTE model in a production environment.

【36】 Boundary Estimation from Point Clouds: Algorithms, Guarantees and Applications 标题:点云边界估计:算法、保证和应用 链接:https://arxiv.org/abs/2111.03217

作者:Jeff Calder,Sangmin Park,Dejan Slepčev 机构: DEPARTMENT OF MATHEMATICAL SCIENCES, CARNEGIE MELLON UNIVERSITY 备注:46 pages, 13 figures 摘要:我们研究从域中的样本点识别域的边界。我们介绍了新的估计法向量的边界,距离的一个点的边界,并测试一个点是否位于边界带。估计量可以有效地计算,并且比文献中的估计量更准确。我们为估计量提供了严格的误差估计。此外,我们使用检测到的边界点来解决点云上偏微分方程的边值问题。我们证明了点云上拉普拉斯方程和eikonal方程的误差估计。最后,我们提供了一系列数值实验来说明我们的边界估计器的性能,在点云上的PDE应用,以及在图像数据集上的测试。 摘要:We investigate identifying the boundary of a domain from sample points in the domain. We introduce new estimators for the normal vector to the boundary, distance of a point to the boundary, and a test for whether a point lies within a boundary strip. The estimators can be efficiently computed and are more accurate than the ones present in the literature. We provide rigorous error estimates for the estimators. Furthermore we use the detected boundary points to solve boundary-value problems for PDE on point clouds. We prove error estimates for the Laplace and eikonal equations on point clouds. Finally we provide a range of numerical experiments illustrating the performance of our boundary estimators, applications to PDE on point clouds, and tests on image data sets.

【37】 Rate of Convergence of Polynomial Networks to Gaussian Processes 标题:多项式网络对高斯过程的收敛速度 链接:https://arxiv.org/abs/2111.03175

作者:Adam Klukowski 机构:Huawei Noah’s Ark Lab 备注:23 pages (13 for the main body) 摘要:我们研究了具有随机权重的单隐层神经网络。众所周知,在无限多个神经元的限制下,它们简化为高斯过程。对于具有多项式激活的网络,我们证明了2-Wasserstein度量的这种收敛速度是$O(n^{-frac{1}{2})$,其中$n$是隐藏神经元的数量。我们怀疑这个比率是渐进的。我们改进了其他激活的已知收敛速度,ReLU的幂律为$n$,erf的平方根逆为对数因子。我们探讨了球谐函数、Stein核和非各向同性环境下最优输运之间的相互作用。 摘要:We examine one-hidden-layer neural networks with random weights. It is well-known that in the limit of infinitely many neurons they simplify to Gaussian processes. For networks with a polynomial activation, we demonstrate that the rate of this convergence in 2-Wasserstein metric is $O(n^{-frac{1}{2}})$, where $n$ is the number of hidden neurons. We suspect this rate is asymptotically sharp. We improve the known convergence rate for other activations, to power-law in $n$ for ReLU and inverse-square-root up to logarithmic factors for erf. We explore the interplay between spherical harmonics, Stein kernels and optimal transport in the non-isotropic setting.

【38】 Hard Negative Sampling via Regularized Optimal Transport for Contrastive Representation Learning 标题:对比表征学习的正则化最优传输硬负采样 链接:https://arxiv.org/abs/2111.03169

作者:Ruijie Jiang,Prakash Ishwar,Shuchin Aeron 摘要:我们研究了非监督对比表征学习中硬负抽样分布的设计问题。我们分析了一个新的最小-最大框架,该框架寻求在所有耦合(受边际约束的正样本和负样本之间的联合分布)上最小化最大(最坏情况)广义对比学习损失的表示,并证明得到的最小-最大最佳表示将退化。这为在耦合上加入额外的正则化约束提供了第一个理论依据。我们通过最优输运理论的透镜重新解释最小-最大问题,并利用正则化输运耦合来控制反例的硬度。我们证明了最近提出的最新硬负采样分布是对应于耦合熵正则化的特例。 摘要:We study the problem of designing hard negative sampling distributions for unsupervised contrastive representation learning. We analyze a novel min-max framework that seeks a representation which minimizes the maximum (worst-case) generalized contrastive learning loss over all couplings (joint distributions between positive and negative samples subject to marginal constraints) and prove that the resulting min-max optimum representation will be degenerate. This provides the first theoretical justification for incorporating additional regularization constraints on the couplings. We re-interpret the min-max problem through the lens of Optimal Transport theory and utilize regularized transport couplings to control the degree of hardness of negative examples. We demonstrate that the state-of-the-art hard negative sampling distributions that were recently proposed are a special case corresponding to entropic regularization of the coupling.

【39】 GraN-GAN: Piecewise Gradient Normalization for Generative Adversarial Networks 标题:GRAN-GAN:生成性对抗网络的分段梯度归一化 链接:https://arxiv.org/abs/2111.03162

作者:Vineeth S. Bhaskara,Tristan Aumentado-Armstrong,Allan Jepson,Alex Levinshtein 机构:Samsung AI Centre Toronto, University of Toronto, Vector Institute for AI 备注:WACV 2022 Main Conference Paper (Submitted: 18 Aug 2021, Accepted: 4 Oct 2021) 摘要:现代生成性对抗网络(GAN)主要在鉴别器(或批评者)中使用分段线性激活函数,包括ReLU和LeakyReLU。这种模型学习分段线性映射,其中每个分段处理输入空间的子集,每个子集的梯度是分段常数。在这类鉴别器(或批评家)函数下,我们提出了梯度归一化(GraN),这是一种新的与输入相关的归一化方法,它保证了输入空间中的分段K-Lipschitz约束。与光谱规范化不同,GraN不限制单个网络层的处理,并且与梯度惩罚不同,GraN几乎在任何地方都严格执行分段Lipschitz约束。根据经验,我们在多个数据集(包括CIFAR-10/100、STL-10、LSUN卧室和CelebA)、GAN损失函数和度量中展示了改进的图像生成性能。此外,我们分析了在几个标准GAN中改变经常不协调的Lipschitz常数K,不仅获得了显著的性能提升,而且还发现了K和训练动态之间的联系,特别是在低梯度损失高原,使用通用Adam优化器。 摘要:Modern generative adversarial networks (GANs) predominantly use piecewise linear activation functions in discriminators (or critics), including ReLU and LeakyReLU. Such models learn piecewise linear mappings, where each piece handles a subset of the input space, and the gradients per subset are piecewise constant. Under such a class of discriminator (or critic) functions, we present Gradient Normalization (GraN), a novel input-dependent normalization method, which guarantees a piecewise K-Lipschitz constraint in the input space. In contrast to spectral normalization, GraN does not constrain processing at the individual network layers, and, unlike gradient penalties, strictly enforces a piecewise Lipschitz constraint almost everywhere. Empirically, we demonstrate improved image generation performance across multiple datasets (incl. CIFAR-10/100, STL-10, LSUN bedrooms, and CelebA), GAN loss functions, and metrics. Further, we analyze altering the often untuned Lipschitz constant K in several standard GANs, not only attaining significant performance gains, but also finding connections between K and training dynamics, particularly in low-gradient loss plateaus, with the common Adam optimizer.

【40】 A SEIR model with time-varying coefficients for analysing the SARS-CoV-2 epidemic 标题:分析SARS-CoV-2疫情的时变系数SEIR模型 链接:https://arxiv.org/abs/2111.03157

作者:P. Girardi,C. Gaetan 机构: 2 and Carlo Gaetan 3 1Department of Developmental and Social Psychology, University of Padova, Italy 2Department of Statistical Sciences, Italy 3Department of Environmental Sciences, Ca’ Foscari University of Venice 摘要:在这项研究中,我们提出了一个时间依赖的易感暴露感染恢复(SEIR)模型,用于分析三个不同国家(美国、意大利和冰岛)的SARS-CoV-2疫情爆发,使用流行波数固有的公共数据。由于政府采取了几种类型和等级的行动,包括旅行限制、社会距离或行动限制,我们希望调查这些措施如何影响感染人群的流行曲线。采用复合似然法估计SEIR模型的相关参数。此外,已针对时间依赖性对标准误差进行了校正。采取限制性措施导致流行病曲线变平,未来的演变表明病例数量减少。 摘要:In this study, we propose a time-dependent Susceptible-Exposed-Infected-Recovered (SEIR) model for the analysis of the SARS-CoV-2 epidemic outbreak in three different countries, the United States of America, Italy and Iceland using public data inherent the numbers of the epidemic wave. Since several types and grades of actions were adopted by the governments, including travel restrictions, social distancing, or limitation of movement, we want to investigate how these measures can affect the epidemic curve of the infectious population. The parameters of interest for the SEIR model were estimated employing a composite likelihood approach. Moreover, standard errors have been corrected for temporal dependence. The adoption of restrictive measures results in flatten epidemic curves, and the future evolution indicated a decrease in the number of cases.

【41】 Amortized Variational Inference for Simple Hierarchical Models 标题:简单分层模型的分期变分推理 链接:https://arxiv.org/abs/2111.03144

作者:Abhinav Agrawal,Justin Domke 机构:College of Information and Computer Science, Univeristy Of Massachusetts Amherst 备注:Neural Information Processing Systems (NeurIPS) 2021 摘要:在层次模型中,由于局部潜在变量的数量随数据集的变化而变化,因此很难使用带有变分推理的子抽样。因此,分层模型中的推理在大规模上仍然是一个挑战。使用结构与后验分布相匹配的变分族是有帮助的,但由于局部分布数量巨大,优化速度仍然很慢。相反,本文提出了一种摊销方法,其中共享参数同时表示所有局部分布。这种方法与使用给定的联合分布(例如,满秩高斯分布)一样精确,但在大几个数量级的数据集上是可行的。它也比使用结构化变分分布快得多。 摘要:It is difficult to use subsampling with variational inference in hierarchical models since the number of local latent variables scales with the dataset. Thus, inference in hierarchical models remains a challenge at large scale. It is helpful to use a variational family with structure matching the posterior, but optimization is still slow due to the huge number of local distributions. Instead, this paper suggests an amortized approach where shared parameters simultaneously represent all local distributions. This approach is similarly accurate as using a given joint distribution (e.g., a full-rank Gaussian) but is feasible on datasets that are several orders of magnitude larger. It is also dramatically faster than using a structured variational distribution.

【42】 Big-Step-Little-Step: Efficient Gradient Methods for Objectives with Multiple Scales 标题:大步小步:多尺度目标的高效梯度法 链接:https://arxiv.org/abs/2111.03137

作者:Jonathan Kelner,Annie Marsden,Vatsal Sharan,Aaron Sidford,Gregory Valiant,Honglin Yuan 机构:MIT, Stanford University, USC† 备注:95 pages, 4 figures; authors are listed in alphabetical order 摘要:我们提供了新的基于梯度的方法来有效地解决一类广泛的病态优化问题。我们考虑函数F$:Mathbb{r} d dRealTraveMathbB{R}$的最小化问题,该问题可隐式分解为$M$未知的非交互光滑、强凸函数的总和,并提供了一种方法,该方法解决了该问题的一些梯度估计,其尺度(对数因子)。作为组件条件数平方根的乘积。这个复杂度界限(我们证明它几乎是最优的)可以比加速梯度法的复杂度界限几乎成倍地提高,加速梯度法的复杂度界限随着条件数$f$的平方根而增长。此外,我们提供了有效的方法来解决这个多尺度优化问题的随机二次变量。我们的方法没有学习$f$的分解(这将非常昂贵),而是采用标准方法的干净递归“大步小步”交错。由此产生的算法使用$ ilde{mathcal{O}}(dm)$空间,在数值上是稳定的,并且为更细粒度地理解超出条件数的凸优化的复杂性打开了大门。 摘要:We provide new gradient-based methods for efficiently solving a broad class of ill-conditioned optimization problems. We consider the problem of minimizing a function $f : mathbb{R}^d ightarrow mathbb{R}$ which is implicitly decomposable as the sum of $m$ unknown non-interacting smooth, strongly convex functions and provide a method which solves this problem with a number of gradient evaluations that scales (up to logarithmic factors) as the product of the square-root of the condition numbers of the components. This complexity bound (which we prove is nearly optimal) can improve almost exponentially on that of accelerated gradient methods, which grow as the square root of the condition number of $f$. Additionally, we provide efficient methods for solving stochastic, quadratic variants of this multiscale optimization problem. Rather than learn the decomposition of $f$ (which would be prohibitively expensive), our methods apply a clean recursive "Big-Step-Little-Step" interleaving of standard methods. The resulting algorithms use $ ilde{mathcal{O}}(d m)$ space, are numerically stable, and open the door to a more fine-grained understanding of the complexity of convex optimization beyond condition number.

【43】 Scaffolding Sets 标题:脚手架套装 链接:https://arxiv.org/abs/2111.03135

作者:Maya Burhanpurkar,Zhun Deng,Cynthia Dwork,Linjun Zhang 机构:†Harvard University 备注:32 pages, 4 figures 摘要:预测器将总体中的单个实例映射到区间$[0,1]$。对于总体子集的集合$mathcal C$,如果在$mathcal C$中的每个集合上同时校准预测值,则预测值相对于$mathcal C$进行多重校准。我们开始研究脚手架集的构造,脚手架集是集的一个小集合$mathcal S$,其属性是,关于$mathcal S$的多重校准确保了预测值的正确性,而不仅仅是校准。我们的方法受到民间智慧的启发,即神经网络的中间层学习高度结构化和有用的数据表示。 摘要:Predictors map individual instances in a population to the interval $[0,1]$. For a collection $mathcal C$ of subsets of a population, a predictor is multi-calibrated with respect to $mathcal C$ if it is simultaneously calibrated on each set in $mathcal C$. We initiate the study of the construction of scaffolding sets, a small collection $mathcal S$ of sets with the property that multi-calibration with respect to $mathcal S$ ensures correctness, and not just calibration, of the predictor. Our approach is inspired by the folk wisdom that the intermediate layers of a neural net learn a highly structured and useful data representation.

【44】 Generative Adversarial Network for Probabilistic Forecast of Random Dynamical System 标题:随机动力系统概率预测的产生式对抗性网络 链接:https://arxiv.org/abs/2111.03126

作者:Kyongmin Yeo,Zan Li,Wesley M. Gifford 机构: Rensselaer Polytechnic Institute 摘要:我们提出了一个深度学习模型,用于无分布假设的随机动力系统的数据驱动模拟。深度学习模型包括一个用于学习时间推进结构的递归神经网络和一个用于学习和采样随机动力系统概率分布的生成对抗网络。尽管生成性对抗网络为复杂概率分布建模提供了强有力的工具,但如果没有适当的正则化,训练往往会失败。在这里,我们提出了一种基于序列推理问题一致性条件的生成性对抗网络正则化策略。首先,使用最大平均差异(MMD)来加强随机过程的条件分布和边缘分布之间的一致性。然后,使用MMD或多个鉴别器对多步预测的边缘分布进行正则化。利用三个具有复杂噪声结构的随机过程研究了该模型的行为。 摘要:We present a deep learning model for data-driven simulations of random dynamical systems without a distributional assumption. The deep learning model consists of a recurrent neural network, which aims to learn the time marching structure, and a generative adversarial network to learn and sample from the probability distribution of the random dynamical system. Although generative adversarial networks provide a powerful tool to model a complex probability distribution, the training often fails without a proper regularization. Here, we propose a regularization strategy for a generative adversarial network based on consistency conditions for the sequential inference problems. First, the maximum mean discrepancy (MMD) is used to enforce the consistency between conditional and marginal distributions of a stochastic process. Then, the marginal distributions of the multiple-step predictions are regularized by using MMD or from multiple discriminators. The behavior of the proposed model is studied by using three stochastic processes with complex noise structures.

【45】 Binary perceptron: efficient algorithms can find solutions in a rare well-connected cluster 标题:二进制感知器:高效的算法可以在罕见的连接良好的集群中找到解决方案 链接:https://arxiv.org/abs/2111.03084

作者:Emmanuel Abbe,Shuangping Li,Allan Sly 摘要:最近的研究表明,对称二元感知器中几乎所有的解都是孤立的,即使在低约束密度下,这表明找到典型解是困难的。相比之下,一些算法在低密度下成功地找到了解决方案。这种现象在数值上被证明是存在次优势和稠密连通区域的解,这些解可以通过简单的学习算法获得。在本文中,我们正式建立了对称和非对称二元感知器的这种现象。我们证明了在低约束密度下(相当于对于过参数化感知机),确实存在一个几乎具有最大直径的次优势连通解簇,并且一个有效的多尺度多数算法能够以高概率在这样的簇中找到解,特别是解决Perkins Xu'21提出的一个公开问题。此外,即使接近临界阈值,我们也证明对称感知器和非对称感知器在附加假设下都存在线性直径的簇。 摘要:It was recently shown that almost all solutions in the symmetric binary perceptron are isolated, even at low constraint densities, suggesting that finding typical solutions is hard. In contrast, some algorithms have been shown empirically to succeed in finding solutions at low density. This phenomenon has been justified numerically by the existence of subdominant and dense connected regions of solutions, which are accessible by simple learning algorithms. In this paper, we establish formally such a phenomenon for both the symmetric and asymmetric binary perceptrons. We show that at low constraint density (equivalently for overparametrized perceptrons), there exists indeed a subdominant connected cluster of solutions with almost maximal diameter, and that an efficient multiscale majority algorithm can find solutions in such a cluster with high probability, settling in particular an open problem posed by Perkins-Xu '21. In addition, even close to the critical threshold, we show that there exist clusters of linear diameter for the symmetric perceptron, as well as for the asymmetric perceptron under additional assumptions.

【46】 Operator Augmentation for Model-based Policy Evaluation 标题:基于模型的策略评估中的算子增强 链接:https://arxiv.org/abs/2110.12658

作者:Xun Tang,Lexing Ying,Yuhua Zhu 机构:edu)†Department of Mathematics and Institute for Computational and Mathematical Engineering 摘要:在基于模型的强化学习中,转移矩阵和奖励向量通常是从受噪声影响的随机样本中估计出来的。即使估计模型是真实基础模型的无偏估计,从估计模型计算的值函数也是有偏的。我们引入了一种算子增广方法来减少估计模型引入的误差。当误差在残差范数内时,我们证明了增广因子始终为正且上界为$1+O(1/n)$,其中n是学习转移矩阵每行所用的样本数。我们还提出了一种实现算子增广的实用数值算法。 摘要:In model-based reinforcement learning, the transition matrix and reward vector are often estimated from random samples subject to noise. Even if the estimated model is an unbiased estimate of the true underlying model, the value function computed from the estimated model is biased. We introduce an operator augmentation method for reducing the error introduced by the estimated model. When the error is in the residual norm, we prove that the augmentation factor is always positive and upper bounded by $1 + O (1/n)$, where n is the number of samples used in learning each row of the transition matrix. We also propose a practical numerical algorithm for implementing the operator augmentation.

机器翻译,仅供参考