zl程序教程

您现在的位置是:首页 >  IT要闻

当前栏目

统计学学术速递[12.8]

2023-04-18 15:01:01 时间

stat统计学,共计43篇

【1】 Change-point regression with a smooth additive disturbance 标题:具有光滑加性扰动的变点回归 链接:https://arxiv.org/abs/2112.03878

作者:Florian Pein 机构:Lancaster University 摘要:我们假设一个非参数回归模型,其信号由一个分段常数函数和一个光滑函数的和给出。为了检测变化点并估计回归函数,我们提出了融合Lasso和核平滑的PCpluS。与现有方法相比,它明确使用了这样的假设,即在检测变化点时,信号可以分解为分段常数和平滑函数。这是由一些应用和部分线性模型的理论结果所推动的。通过交叉验证选择调整参数。我们认为,在这种情况下,最小化L1损耗优于最小化L2损耗。我们还强调了分段常数变点回归交叉验证的重要结果。仿真结果表明,我们的方法具有较小的平均均方误差,能够很好地检测变化点,并且我们将该方法应用于基因组测序数据以检测拷贝数变化。最后,我们通过将其与平滑样条曲线相结合,并提出对多元数据和过滤数据的扩展,展示了它的灵活性。 摘要:We assume a nonparametric regression model with signals given by the sum of a piecewise constant function and a smooth function. To detect the change-points and estimate the regression functions, we propose PCpluS, a combination of the fused Lasso and kernel smoothing. In contrast to existing approaches, it explicitly uses the assumption that the signal can be decomposed into a piecewise constant and a smooth function when detecting change-points. This is motivated by several applications and by theoretical results about partial linear model. Tuning parameters are selected by cross-validation. We argue that in this setting minimizing the L1-loss is superior to minimizing the L2-loss. We also highlight important consequences for cross-validation in piecewise constant change-point regression. Simulations demonstrate that our approach has a small average mean square error and detects change-points well, and we apply the methodology to genome sequencing data to detect copy number variations. Finally, we demonstrate its flexibility by combining it with smoothing splines and by proposing extensions to multivariate and filtered data.

【2】 Analyzing Highly Correlated Chemical Toxicants Associated with Time to Pregnancy Using Discrete Survival Frailty Modeling Via Elastic Net 标题:基于弹性网络的离散生存脆弱性模型分析与孕期高度相关的化学毒物 链接:https://arxiv.org/abs/2112.03762

作者:Abhisek Saha,Rajeshwari Sundaram 机构:Eunice Kennedy Shriver National Institute of, Child Health and Human Development, National Institutes of Health, Maryland, USA, Correspondence, Shriver National Institute of Child Health, and Human Development, National Institutes, of Health, Bethesda, Maryland, USA 摘要:了解环境毒物混合物与妊娠时间(TTP)之间的关系是一个重要的科学问题,因为已有足够的证据表明个体毒物对生殖健康的影响,并且个体暴露于大量毒物而非个体毒物。评估化学品混合物对TTP的影响带来了重大的统计挑战,即(i)TTP是一种离散的生存结果,通常受到左截短和右截尾的影响,(ii)化学品暴露高度相关,(iii)考虑一些与脂质结合的化学品,(iv)一些化学品的非线性影响,(v)某些化学品的高百分比浓度低于检测限(LOD)。我们提出了一个离散脆弱性建模框架(命名为Discnet),允许在解决上述问题的同时选择相关风险。在各种模拟环境下,与其他方法相比,Discnet具有更好且稳定的FN和FP率。我们对有关多氯联苯和怀孕时间的LIFE研究进行了详细分析,发现年龄较大的女性、女性接触可替宁(吸烟)、滴滴涕导致怀孕延迟,这与先前的敏感性分析一致,以解释LOD以及非线性关联。 摘要:Understanding the association between mixtures of environmental toxicants and time-to-pregnancy (TTP) is an important scientific question as sufficient evidence has emerged about the impact of individual toxicants on reproductive health and that individuals are exposed to a whole host of toxicants rather than an individual toxicant. Assessing mixtures of chemicals effects on TTP poses significant statistical challenges, namely (i) TTP being a discrete survival outcome, typically subject to left truncation and right censoring, (ii) chemical exposures being strongly correlated, (iii) accounting for some chemicals that bind to lipids, (iv) non-linear effects of some chemicals, and (v) high percentage concentration below the limit of detection (LOD) for some chemicals. We propose a discrete frailty modeling framework (named Discnet) that allows selection of correlated exposures while addressing the issues mentioned above. Discnet is shown to have better and stable FN and FP rates compared to alternative methods in various simulation settings. We did a detailed analysis of the LIFE Study, pertaining to polychlorinated biphenyls and time-to-pregnancy and found that older females, female exposure to cotinine (smoking), DDT conferred a delay in getting pregnant, which was consistent across prior sensitivity analyses to account for LOD as well as non-linear associations.

【3】 Outpatient Diversion using Real-Time Length-of-Stay Predictions 标题:使用实时住院时间预测的门诊分流 链接:https://arxiv.org/abs/2112.03761

作者:Najiya Fatma,Varun Ramamohan 摘要:在这项工作中,我们展示了如何利用实时住院时间(LOS)预测将门诊病人从指定的医疗机构转移到交通拥堵程度较低的其他医疗机构。我们举例说明了这一转移机制在两个初级卫生中心(PHC)的实施情况,其中我们根据患者在两个设施中的预计损失,将患者从指定的PHC转移到另一个PHC。我们开发了印度地区这两个PHC患者流动操作的离散事件模拟模型,并观察到由于两个PHC患者负荷的差异,其中一个PHC的损失明显更长。我们首先使用当前记录在相关PHC的系统状态信息,确定患者预计到达PHC的时间点的预期服务水平。实时服务水平预测是通过实时估计PHC内排队子系统的患者等待时间生成的。然后,我们根据预测的两个初级保健中心的服务水平估计值将患者转移到适当的初级保健中心,并通过模拟表明,拟议的框架能够更公平地利用门诊服务所涉及的资源。 摘要:In this work, we show how real-time length-of-stay (LOS) predictions can be used to divert outpatients from their assigned facility to other facilities with lesser congestion. We illustrate the implementation of this diversion mechanism for two primary health centers (PHCs), wherein we divert patients from their assigned PHC to the other PHC based on their predicted LOSs in both facilities. We develop a discrete-event simulation model of patient flow operations at these two PHCs in an Indian district and observe significantly longer LOSs at one of the PHCs due to disparities in the patient loads across both PHCs. We first determine the expected LOS of the patient at the point in time at which they are expected to arrive at a PHC using system state information recorded at the current time at the PHC in question. The real-time LOS predictions are generated by estimating patient wait times on a real-time basis at the queueing subsystems within the PHC. We then divert the patient to the appropriate PHC on the basis of the predicted LOS estimates at both PHCs, and show through simulation that the proposed framework leads to more equitable utilization of resources involved in provision of outpatient care.

【4】 Equity in Stochastic Healthcare Facility Location 标题:随机医疗机构选址中的公平性 链接:https://arxiv.org/abs/2112.03760

作者:Karmel S. Shehadeh,Lawrence V. Snyder 机构:Department of Industrial and Systems Engineering, Lehigh University, Bethlehem, PA, USA 摘要:我们考虑公平问题的随机设施选址模型的医疗保健应用。我们探讨了不确定性如何加剧不平等,并研究了几种可用于随机医疗地点建模的公平性度量。我们分析了关于这一主题的有限文献,并强调了开发可操作、可靠和数据驱动的方法的机会领域,这些方法可能适用于医疗操作内外。我们的主要重点是探索各种方法来建模不确定性、公平性和设施位置,包括建模方面(例如,可跟踪性和准确性)和结果(例如,公平性/公平性/访问性能指标与传统指标,如成本和服务水平)。 摘要:We consider issues of equity in stochastic facility location models for healthcare applications. We explore how uncertainty exacerbates inequity and examine several equity measures that can be used for stochastic healthcare location modeling. We analyze the limited literature on this subject and highlight areas of opportunity for developing tractable, reliable, and data-driven approaches that might be applicable within and outside healthcare operations. Our primary focus is on exploring various ways to model uncertainty, equity, and facility location, including modeling aspects (e.g., tractability and accuracy) and outcomes (e.g., equity/fairness/access performance metrics vs. traditional metrics like cost and service levels).

【5】 A Comparison of Estimand and Estimation Strategies for Clinical Trials in Early Parkinson's Disease 标题:帕金森病早期临床试验评估与评估策略的比较 链接:https://arxiv.org/abs/2112.03700

作者:Alessandro Noci,Marcel Wolbers,Markus Abt,Corine Baayen,Hans Ulrich Burger,Man Jin,Weining Zhao Robieson 机构:Data and Statistical Sciences, Pharma Development, Roche, Basel, Switzerland, Biometrics Division, H.Lundbeck AS, Copenhagen, Denmark, Data and Statistical Sciences, AbbVie Inc., North Chicago, IL , USA, These authors contributed equally to this work. 备注:Manuscript (19 pages, 3 figures, 3 tables) and supplementary Appendix (5 pages) 摘要:帕金森病(PD)是一种慢性、退行性神经系统疾病。到目前为止,帕金森病无法预防、减缓或治愈,但可以采用高效的对症治疗。我们考虑相关的评估和治疗效果估计器的随机试验的一种新的治疗,其目的是减缓疾病进展与安慰剂在早期,未经治疗的PD。帕金森病试验的一个常用终点是MDS统一帕金森病评定量表(MDS-UPDRS),该量表在计划就诊时进行纵向评估。影响MDS-UPDRS解释的最重要的并发事件(ICE)是研究治疗中断和开始对症治疗。讨论了不同的评估策略,不同类型ICE的假设或治疗策略似乎最适合这种情况。在模拟研究中,提出了几种以多重插补为目标的估计量,并在偏差、均方误差和功率方面进行了比较。所调查的估计数包括基于随机缺失(MAR)假设的方法,包括和不包括时变ICE指标,以及基于参考的插补方法。模拟参数由帕金森病进展标志物倡议(PPMI)队列研究的数据分析驱动。 摘要:Parkinson's disease (PD) is a chronic, degenerative neurological disorder. PD cannot be prevented, slowed or cured as of today but highly effective symptomatic treatments are available. We consider relevant estimands and treatment effect estimators for randomized trials of a novel treatment which aims to slow down disease progression versus placebo in early, untreated PD. A commonly used endpoint in PD trials is the MDS-Unified Parkinson's Disease Rating Scale (MDS-UPDRS), which is longitudinally assessed at scheduled visits. The most important intercurrent events (ICEs) which affect the interpretation of the MDS-UPDRS are study treatment discontinuations and initiations of symptomatic treatment. Different estimand strategies are discussed and hypothetical or treatment policy strategies, respectively, for different types of ICEs seem most appropriate in this context. Several estimators based on multiple imputation which target these estimands are proposed and compared in terms of bias, mean-squared error, and power in a simulation study. The investigated estimators include methods based on a missing-at-random (MAR) assumption, with and without the inclusion of time-varying ICE-indicators, as well as reference-based imputation methods. Simulation parameters are motivated by data analyses of a cohort study from the Parkinson's Progression Markers Initiative (PPMI).

【6】 Piecewise survival models: a change-point analysis on herpes zoster associated pain data revisited and extended 标题:分段生存模型:带状疱疹相关疼痛数据的变点分析 链接:https://arxiv.org/abs/2112.03688

作者:Dimitra Eleftheriou,Dimitris Karlis 机构:School of Mathematics and Statistics, University of Glasgow, Department of Statistics, Athens Univeristy of Economics and Business 摘要:对于许多疾病,可以合理地假设危险率不是随时间变化的常数,而是在不同的时间间隔内变化的。为了抓住这一点,我们在这里使用分段生存模型。这种分段模型的主要问题之一是确定危险率变化的时间点。从实用的角度来看,这可以提供非常重要的信息,因为它可能反映疾病进展的变化。我们提出了带有协变量的分段威布尔回归模型。假设发生变化的时间点未知,需要进行估计。还检查了不同阶段的危险率是否相等,以验证阶段的确切数量。一个基于带状疱疹数据的例子已经被用来证明所开发方法的有效性。 摘要:For many diseases it is reasonable to assume that the hazard rate is not constant across time, but also that it changes in different time intervals. To capture this, we work here with a piecewise survival model. One of the major problems in such piecewise models is to determine the time points of change of the hazard rate. From the practical point of view this can provide very important information as it may reflect changes in the progress of a disease. We present piecewise Weibull regression models with covariates. The time points where change occurs are assumed unknown and need to be estimated. The equality of hazard rates across the distinct phases is also examined to verify the exact number of phases. An example based on herpes zoster data has been used to demonstrate the usefulness of the developed methodology.

【7】 A generalization gap estimation for overparameterized models via Langevin functional variance 标题:基于朗之万函数方差的过参数模型泛化缺口估计 链接:https://arxiv.org/abs/2112.03660

作者:Akifumi Okuno,Keisuke Yano 机构:The Institute of Statistical Mathematics, RIKEN Center for Advanced Intelligence Project 备注:21 pages, no figure 摘要:本文讨论过参数化模型(如神经网络)的泛化差距估计,即泛化差距和经验误差之间的差异。我们首先表明,函数方差是定义广泛适用的信息标准的一个关键概念,即使在常规理论无法应用的过参数化环境中,它也表征了泛化差距。接下来,我们提出了一种计算效率高的函数方差近似,即函数方差的朗之万近似(Langevin-FV)。该方法利用平方损失函数的一阶梯度,而不是二阶梯度;因此,它可以高效地计算并与基于梯度的优化算法一致地实现。我们在数值上证明了Langevin FV在估计过参数线性回归和非线性神经网络模型的泛化差距方面的作用。 摘要:This paper discusses estimating the generalization gap, a difference between a generalization gap and an empirical error, for overparameterized models (e.g., neural networks). We first show that a functional variance, a key concept in defining a widely-applicable information criterion, characterizes the generalization gap even in overparameterized settings, where a conventional theory cannot be applied. We next propose a computationally efficient approximation of the function variance, a Langevin approximation of the functional variance~(Langevin FV). This method leverages the 1st-order but not the 2nd-order gradient of the squared loss function; so, it can be computed efficiently and implemented consistently with gradient-based optimization algorithms. We demonstrate the Langevin FV numerically in estimating generalization gaps of overparameterized linear regression and non-linear neural network models.

【8】 Understanding Square Loss in Training Overparametrized Neural Network Classifiers 标题:理解超参数化神经网络分类器训练中的平方损失 链接:https://arxiv.org/abs/2112.03657

作者:Tianyang Hu,Jun Wang,Wenjia Wang,Zhenguo Li 机构:Huawei Noah’s Ark Lab, HKUST 摘要:深度学习在现代分类任务中取得了许多突破。对于不同的数据结构,已经提出了许多体系结构,但对于损失函数,交叉熵损失是主要的选择。最近,一些替代损失使深度分类器的兴趣复活。特别是,经验证据似乎促进了平方损失,但仍然缺乏理论依据。在这项工作中,我们系统地研究了平方损失在神经切线核(NTK)机制下对过参数化神经网络的性能,从而有助于对分类中平方损失的理论理解。揭示了有关泛化误差、鲁棒性和校准误差的有趣特性。我们考虑两种情况,根据类是可分离的还是不分离的。在一般的不可分情况下,对误分类率和校准误差都建立了快速收敛速度。当类是可分离的时,错误分类率会以指数级的速度提高。此外,所得到的裕度被证明是远离零的下限,为鲁棒性提供了理论保证。我们希望我们的发现能够超越NTK制度,并转化为实际情况。为此,我们对实际神经网络进行了广泛的实证研究,证明了平方损失在合成低维数据和真实图像数据中的有效性。与交叉熵相比,平方损失具有可比的泛化误差,但在鲁棒性和模型校准方面具有显著优势。 摘要:Deep learning has achieved many breakthroughs in modern classification tasks. Numerous architectures have been proposed for different data structures but when it comes to the loss function, the cross-entropy loss is the predominant choice. Recently, several alternative losses have seen revived interests for deep classifiers. In particular, empirical evidence seems to promote square loss but a theoretical justification is still lacking. In this work, we contribute to the theoretical understanding of square loss in classification by systematically investigating how it performs for overparametrized neural networks in the neural tangent kernel (NTK) regime. Interesting properties regarding the generalization error, robustness, and calibration error are revealed. We consider two cases, according to whether classes are separable or not. In the general non-separable case, fast convergence rate is established for both misclassification rate and calibration error. When classes are separable, the misclassification rate improves to be exponentially fast. Further, the resulting margin is proven to be lower bounded away from zero, providing theoretical guarantees for robustness. We expect our findings to hold beyond the NTK regime and translate to practical settings. To this end, we conduct extensive empirical studies on practical neural networks, demonstrating the effectiveness of square loss in both synthetic low-dimensional data and real image data. Comparing to cross-entropy, square loss has comparable generalization error but noticeable advantages in robustness and model calibration.

【9】 Bless and curse of smoothness and phase transitions in nonparametric regressions: a nonasymptotic perspective 标题:非参数回归中光滑性和相变的祝福与诅咒:一个非渐近的观点 链接:https://arxiv.org/abs/2112.03626

作者:Ying Zhu 备注:3 Tables 摘要:当回归函数属于由一元函数组成的标准光滑类,其导数高达$(gamma+1)$阶,处处由一个公共常数或a.e.限定时,众所周知,均方误差(MSE)的最小最大最优收敛速度为$左(frac{sigma^{2}{n}右)^{frac{2gamma 2}{2gamma+3}}$当$gamma$是有限的且样本大小为$n ightarrowinfty$。从考虑有限$n$的非共观性观点来看,本文表明:对于标准H'older和Sobolev类,当$frac{sigma^{2}左(gammavee1 ight)}{n}$^{2gamma+3}$和$left($frac{sigma^{2}}{n} ight)^{frac{2gamma+2}{2gamma+3}$当$frac{n}{sigma^{2}}succim left(gamma vee1 ight)^{2gamma 3}$时。为了建立这些结果,我们推导了广义H'older类的覆盖数和填充数的上下界,其中,$k=k,$导数由参数$R{k}$从上方限定,$gamma$th导数为$R{gamma+1}-$Lipschitz(也适用于光滑函数的广义椭球类)。我们的界锐化了标准类的经典度量熵结果,并给出了对$gamma$和$R{k}的一般依赖性$。通过推导$R{k}=1$、$R{k}leq left(k-1 right)!$和$R{k}=k!$(后面两种情况在我们的介绍中有动机)下的极小极大最优MSE率,借助我们的新熵界,我们展示了文献中现有熵界无法显示的两个有趣结果“一类较老的$d-$变量函数,我们的结果表明经典的渐近速率$left(frac{sigma^{2}}{n} ight)^{frac{2gamma+2}{2gamma+2+d}$可能低估了有限样本中的均方误差。 摘要:When the regression function belongs to the standard smooth classes consisting of univariate functions with derivatives up to the $(gamma+1)$th order bounded by a common constant everywhere or a.e., it is well known that the minimax optimal rate of convergence in mean squared error (MSE) is $left(frac{sigma^{2}}{n} ight)^{frac{2gamma+2}{2gamma+3}}$ when $gamma$ is finite and the sample size $n ightarrowinfty$. From a nonasymptotic viewpoint that considers finite $n$, this paper shows that: for the standard H"older and Sobolev classes, the minimax optimal rate is $frac{sigma^{2}left(gammavee1 ight)}{n}$ when $frac{n}{sigma^{2}}precsimleft(gammavee1 ight)^{2gamma+3}$ and $left(frac{sigma^{2}}{n} ight)^{frac{2gamma+2}{2gamma+3}}$ when $frac{n}{sigma^{2}}succsimleft(gammavee1 ight)^{2gamma+3}$. To establish these results, we derive upper and lower bounds on the covering and packing numbers for the generalized H"older class where the $k$th ($k=0,...,gamma$) derivative is bounded from above by a parameter $R_{k}$ and the $gamma$th derivative is $R_{gamma+1}-$Lipschitz (and also for the generalized ellipsoid class of smooth functions). Our bounds sharpen the classical metric entropy results for the standard classes, and give the general dependence on $gamma$ and $R_{k}$. By deriving the minimax optimal MSE rates under $R_{k}=1$, $R_{k}leqleft(k-1 ight)!$ and $R_{k}=k!$ (with the latter two cases motivated in our introduction) with the help of our new entropy bounds, we show a couple of interesting results that cannot be shown with the existing entropy bounds in the literature. For the H"older class of $d-$variate functions, our result suggests that the classical asymptotic rate $left(frac{sigma^{2}}{n} ight)^{frac{2gamma+2}{2gamma+2+d}}$ could be an underestimate of the MSE in finite samples.

【10】 Private Robust Estimation by Stabilizing Convex Relaxations 标题:稳定凸松弛的私人稳健估计 链接:https://arxiv.org/abs/2112.03548

作者:Pravesh K. Kothari,Pasin Manurangsi,Ameya Velingker 摘要:我们给出了第一个多项式时间和样本$(epsilon,delta)$-差分私有(DP)算法,用于在存在常数部分敌对异常值的情况下估计平均值、协方差和高阶矩。我们的算法成功地应用于满足鲁棒估计两个已被充分研究的性质的分布族:方向矩的可证明次高斯性和二次多项式的可证明超压缩性。我们的恢复保证适用于“右仿射不变范数”:平均值的马氏距离、乘法谱和相对Frobenius距离的协方差保证和高阶矩的内射范数。以前的工作获得了具有有界协方差的次高斯分布均值估计的私有鲁棒算法。对于协方差估计,我们的算法是第一个在没有任何条件数假设的情况下成功的高效算法(即使在没有异常值的情况下)。我们的算法产生于一个新的框架,该框架提供了一个通用的蓝图,用于修改鲁棒估计的凸松弛,以在适当的参数范数下满足强大的最坏情况稳定性保证,只要算法在运行中产生正确性的见证。我们验证了标准平方和(SoS)半定规划松弛的鲁棒估计修改的这种保证。我们的隐私保证是通过将稳定性保证与一种新的“估计相关”噪声注入机制相结合而获得的,在这种机制中,噪声以估计协方差的特征值进行缩放。我们相信,该框架将更普遍地用于获得稳健估计的DP对应项。独立于我们的工作,Ashtiani和Liaw[AL21]还获得了高斯分布的多项式时间和样本私有鲁棒估计算法。 摘要:We give the first polynomial time and sample $(epsilon, delta)$-differentially private (DP) algorithm to estimate the mean, covariance and higher moments in the presence of a constant fraction of adversarial outliers. Our algorithm succeeds for families of distributions that satisfy two well-studied properties in prior works on robust estimation: certifiable subgaussianity of directional moments and certifiable hypercontractivity of degree 2 polynomials. Our recovery guarantees hold in the "right affine-invariant norms": Mahalanobis distance for mean, multiplicative spectral and relative Frobenius distance guarantees for covariance and injective norms for higher moments. Prior works obtained private robust algorithms for mean estimation of subgaussian distributions with bounded covariance. For covariance estimation, ours is the first efficient algorithm (even in the absence of outliers) that succeeds without any condition-number assumptions. Our algorithms arise from a new framework that provides a general blueprint for modifying convex relaxations for robust estimation to satisfy strong worst-case stability guarantees in the appropriate parameter norms whenever the algorithms produce witnesses of correctness in their run. We verify such guarantees for a modification of standard sum-of-squares (SoS) semidefinite programming relaxations for robust estimation. Our privacy guarantees are obtained by combining stability guarantees with a new "estimate dependent" noise injection mechanism in which noise scales with the eigenvalues of the estimated covariance. We believe this framework will be useful more generally in obtaining DP counterparts of robust estimators. Independently of our work, Ashtiani and Liaw [AL21] also obtained a polynomial time and sample private robust estimation algorithm for Gaussian distributions.

【11】 A Function-Based Approach to Model the Measurement Error in Wearable Devices 标题:一种基于函数的可穿戴设备测量误差建模方法 链接:https://arxiv.org/abs/2112.03539

作者:Sneha Jadhav,Carmen D. Tekwe,Yuanyuan Luan 机构:Department of Mathematics and Statistics, Wake Forest University, Department of Epidemiology and Biostatistics, Indiana University 摘要:体力活动(PA)是许多健康结果的重要风险因素。可穿戴设备(如加速计)越来越多地用于生物医学研究,以了解PA与健康结果之间的关系。由于以下三个特点,涉及加速计数据的统计分析具有挑战性:(i)高维性,(ii)时间相关性,以及(iii)测量误差。为了应对这些挑战,我们将基于加速度计的体力活动测量视为易于产生测量误差的单一函数值协变量。具体而言,为了确定PA与相关健康结果之间的关系,我们提出了一个回归模型,该模型具有解释测量误差的功能协变量。利用回归校准,我们发展了一种两步估计模型参数的方法,并建立了它们的一致性。还提出了一个测试来测试估计模型参数的显著性。在不同的场景下,进行了仿真研究,以将所提出的方法与现有的替代方法进行比较。最后,使用所开发的方法评估PA强度与从国家健康和营养检查调查数据中获得的BMI之间的关系。 摘要:Physical activity (PA) is an important risk factor for many health outcomes. Wearable-devices such as accelerometers are increasingly used in biomedical studies to understand the associations between PA and health outcomes. Statistical analyses involving accelerometer data are challenging due to the following three characteristics: (i) high-dimensionality, (ii) temporal dependence, and (iii) measurement error. To address these challenges we treat accelerometer-based measures of physical activity as a single function-valued covariate prone to measurement error. Specifically, in order to determine the relationship between PA and a health outcome of interest, we propose a regression model with a functional covariate that accounts for measurement error. Using regression calibration, we develop a two-step estimation method for the model parameters and establish their consistency. A test is also proposed to test the significance of the estimated model parameters. Simulation studies are conducted to compare the proposed methods with existing alternative approaches under varying scenarios. Finally, the developed methods are used to assess the relationship between PA intensity and BMI obtained from the National Health and Nutrition Examination Survey data.

【12】 A Unifying Bayesian Approach for Sample Size Determination Using Design and Analysis Priors 标题:利用设计和分析先验确定样本量的统一贝叶斯方法 链接:https://arxiv.org/abs/2112.03509

作者:Jane Pan,Sudipto Banerjee 机构:Department of Biostatistics, University of California, Los Angeles (UCLA), and 摘要:功率和样本量分析是临床试验研究设计的关键组成部分。有大量的方法从不同的角度解决这个问题。特别是贝叶斯范式,已经引起了人们的注意,它包含了确定样本大小的不同视角。基于O'Hagan和Stevens(2001)在设计和分析阶段进行的不同优先级的成本效益分析,我们开发了一个基于模拟的样本大小确定的通用贝叶斯框架,该框架可以轻松地在中等计算架构上实现。我们进一步确定了设计和分析阶段对不同优先级的需求。我们主要工作在共轭贝叶斯线性回归模型的上下文中,在这里我们考虑已知和未知方差的情况。自始至终,我们与作为特例出现的频繁解和交替贝叶斯方法进行了比较,重点是现有方法的数值结果如何作为我们框架中的特例出现。 摘要:Power and sample size analysis comprises a critical component of clinical trial study design. There is an extensive collection of methods addressing this problem from diverse perspectives. The Bayesian paradigm, in particular, has attracted noticeable attention and includes different perspectives for sample size determination. Building upon a cost-effectiveness analysis undertaken by O'Hagan and Stevens (2001) with different priors in the design and analysis stage, we develop a general Bayesian framework for simulation-based sample size determination that can be easily implemented on modest computing architectures. We further qualify the need for different priors for the design and analysis stage. We work primarily in the context of conjugate Bayesian linear regression models, where we consider the situation with known and unknown variances. Throughout, we draw parallels with frequentist solutions, which arise as special cases, and alternate Bayesian approaches with an emphasis on how the numerical results from existing methods arise as special cases in our framework.

【13】 Training Deep Models to be Explained with Fewer Examples 标题:训练深度模型需要用更少的例子来解释 链接:https://arxiv.org/abs/2112.03508

作者:Tomoharu Iwata,Yuya Yoshikawa 机构:NTT Communication Science Laboratories, Software Technology and Artificial Intelligence Research Laboratory, Chiba Institute of Technology 摘要:尽管深度模型具有很高的预测性能,但人类很难理解他们所做的预测。解释性对于真实应用程序来说很重要,以证明其可靠性。已经提出了许多基于示例的解释方法,例如重新输入点选择,其中由一组训练示例定义的解释模型用于解释预测模型。为了提高解释性,减少解释模型中的示例数非常重要。然而,使用较少实例的解释可能是不可靠的,因为用这种基于实例的解释模型很难很好地逼近预测模型。不忠实的解释意味着可解释模型的预测与预测模型的预测不同。我们提出了一种训练深度模型的方法,使得它们的预测能够被解释模型用少量的例子忠实地解释。我们使用稀疏正则化器同时训练预测和解释模型,以减少示例数。该方法可用于任何基于神经网络的预测模型。使用多个数据集的实验表明,该方法在保持预测性能的同时提高了信度。 摘要:Although deep models achieve high predictive performance, it is difficult for humans to understand the predictions they made. Explainability is important for real-world applications to justify their reliability. Many example-based explanation methods have been proposed, such as representer point selection, where an explanation model defined by a set of training examples is used for explaining a prediction model. For improving the interpretability, reducing the number of examples in the explanation model is important. However, the explanations with fewer examples can be unfaithful since it is difficult to approximate prediction models well by such example-based explanation models. The unfaithful explanations mean that the predictions by the explainable model are different from those by the prediction model. We propose a method for training deep models such that their predictions are faithfully explained by explanation models with a small number of examples. We train the prediction and explanation models simultaneously with a sparse regularizer for reducing the number of examples. The proposed method can be incorporated into any neural network-based prediction models. Experiments using several datasets demonstrate that the proposed method improves faithfulness while keeping the predictive performance.

【14】 Conformal Sensitivity Analysis for Individual Treatment Effects 标题:个体治疗效果的保角灵敏度分析 链接:https://arxiv.org/abs/2112.03493

作者:Mingzhang Yin,Claudia Shi,Yixin Wang,David M. Blei 摘要:评估个体治疗效果(ITE)对于个性化决策至关重要。然而,现有的估算ITE的方法通常依赖于无界性,这一假设与观测数据基本上是不稳定的。为此,本文提出了一种ITE敏感性分析方法,一种在未观测到的混杂情况下估计ITE范围的方法。我们开发的方法通过边际敏感性模型对未测量的混杂进行量化[Ros2002,Tan2006],然后采用共形推理框架来估计给定混杂强度下的ITE区间。特别地,我们将灵敏度分析问题描述为分布移位下的共形推理问题,并将现有的共变量移位共形推理方法推广到更一般的情况。结果是一个预测区间,该区间保证了ITE的标称覆盖率,该方法提供了无分布和非交感保证的覆盖率。我们评估了合成数据的方法,并说明了它在一项观察性研究中的应用。 摘要:Estimating an individual treatment effect (ITE) is essential to personalized decision making. However, existing methods for estimating the ITE often rely on unconfoundedness, an assumption that is fundamentally untestable with observed data. To this end, this paper proposes a method for sensitivity analysis of the ITE, a way to estimate a range of the ITE under unobserved confounding. The method we develop quantifies unmeasured confounding through a marginal sensitivity model [Ros2002, Tan2006], and then adapts the framework of conformal inference to estimate an ITE interval at a given confounding strength. In particular, we formulate this sensitivity analysis problem as one of conformal inference under distribution shift, and we extend existing methods of covariate-shifted conformal inference to this more general setting. The result is a predictive interval that has guaranteed nominal coverage of the ITE, a method that provides coverage with distribution-free and nonasymptotic guarantees. We evaluate the method on synthetic data and illustrate its application in an observational study.

【15】 Mesh-Based Solutions for Nonparametric Penalized Regression 标题:非参数惩罚回归的网格解法 链接:https://arxiv.org/abs/2112.03428

作者:Brayan Ortiz,Noah Simon 机构:and, Department of Biostatistics, University of Washington 备注:29 pages, 4 figures 摘要:对回归函数进行非参数估计通常是很有意义的。惩罚回归(PR)是一种统计上有效的、经过充分研究的解决方案。不幸的是,在许多情况下,找到PR问题的精确解在计算上是很困难的。在这篇手稿中,我们为这些场景提出了一个基于网格的近似解(MBS)。MBS将NPR的复杂函数极小化转化为有限参数离散凸极小化;使我们能够利用现代凸优化的工具。我们在许多明确的例子中展示了MBS的应用(包括单变量和多变量回归),并探讨了参数的数量必须如何随着样本量的增加而增加,以使MBS保持NPR的速率最优。我们还提出了一种有效的算法,以最小化MBS目标,同时有效地利用MBS固有的稀疏性。 摘要:It is often of interest to estimate regression functions non-parametrically. Penalized regression (PR) is one statistically-effective, well-studied solution to this problem. Unfortunately, in many cases, finding exact solutions to PR problems is computationally intractable. In this manuscript, we propose a mesh-based approximate solution (MBS) for those scenarios. MBS transforms the complicated functional minimization of NPR, to a finite parameter, discrete convex minimization; and allows us to leverage the tools of modern convex optimization. We show applications of MBS in a number of explicit examples (including both uni- and multi-variate regression), and explore how the number of parameters must increase with our sample-size in order for MBS to maintain the rate-optimality of NPR. We also give an efficient algorithm to minimize the MBS objective while effectively leveraging the sparsity inherent in MBS.

【16】 Using Image Transformations to Learn Network Structure 标题:利用图像变换学习网络结构 链接:https://arxiv.org/abs/2112.03419

作者:Brayan Ortiz,Amitabh Sinha 机构:†These authors contributed equally to this work. 备注:11 pages, 6 figures, 5 tables, In Submission with International Journal of Data Science and Analytics, Special Issue: Domain Driven Data Mining 摘要:许多学习任务需要观察一系列图像并做出决定。在设计和规划节点间装运箱的运输问题中,我们展示了如何将节点网络和节点之间的流视为图像。这些图像具有有用的结构信息,可以进行统计总结。使用图像压缩技术,我们将图像压缩为一组数字,其中包含可解释的地理信息,我们称之为地理特征。通过使用地理特征,我们可以了解可用于推荐未来网络连接的网络结构。我们开发了一种贝叶斯强化算法,该算法利用统计汇总的网络信息作为先验信息和用户决策来强化代理的概率决策。 摘要:Many learning tasks require observing a sequence of images and making a decision. In a transportation problem of designing and planning for shipping boxes between nodes, we show how to treat the network of nodes and the flows between them as images. These images have useful structural information that can be statistically summarized. Using image compression techniques, we reduce an image down to a set of numbers that contain interpretable geographic information that we call geographic signatures. Using geographic signatures, we learn network structure that can be utilized to recommend future network connectivity. We develop a Bayesian reinforcement algorithm that takes advantage of statistically summarized network information as priors and user-decisions to reinforce an agent's probabilistic decision.

【17】 On the computation of a non-parametric estimator by convex optimization 标题:关于非参数估计的凸优化计算 链接:https://arxiv.org/abs/2112.03390

作者:Akshay Seshadri,Stephen Becker 机构:Department of Physics, University of Colorado Boulder, Department of Applied Mathematics, University of Colorado Boulder 备注:10 pages, no figures 摘要:从观测数据估计线性泛函是许多学科中的一项重要任务。Juditsky和Nemirovski[The Annals of Statistics 37.5A(2009):2278-2300]提出了一个框架,用于在非常一般的环境下,以接近最小最大最优置信区间对线性泛函进行非参数估计。他们通过近似函数的鞍点来计算这个估计量和相关的置信区间。虽然这个优化问题是凸的,但使用现有的现成优化软件很难解决。此外,当估计器生活在高维空间中时,这种计算可能是昂贵的。我们提出了一种不同的算法来构造这个估计器。我们的算法可以与现有的优化软件一起使用,并且即使当估计量在高维空间中时,只要能够在给定参数的情况下有效地计算所选参数分布的Hellinger亲和性(或Bhattacharyya系数),实现起来也要便宜得多。我们希望,我们的算法将促进采用这种估计技术,以相对容易地解决更广泛的问题。 摘要:Estimation of linear functionals from observed data is an important task in many subjects. Juditsky & Nemirovski [The Annals of Statistics 37.5A (2009): 2278-2300] propose a framework for non-parametric estimation of linear functionals in a very general setting, with nearly minimax optimal confidence intervals. They compute this estimator and the associated confidence interval by approximating the saddle-point of a function. While this optimization problem is convex, it is rather difficult to solve using existing off-the-shelf optimization software. Furthermore, this computation can be expensive when the estimators live in a high-dimensional space. We propose a different algorithm to construct this estimator. Our algorithm can be used with existing optimization software and is much cheaper to implement even when the estimators are in a high-dimensional space, as long as the Hellinger affinity (or the Bhattacharyya coefficient) for the chosen parametric distribution can be efficiently computed given the parameters. We hope that our algorithm will foster the adoption of this estimation technique to a wider variety of problems with relative ease.

【18】 Using principal stratification in analysis of clinical trials 标题:主分层在临床试验分析中的应用 链接:https://arxiv.org/abs/2112.03352

作者:Ilya Lipkovich,Bohdana Ratitch,Yongming Qu,Xiang Zhang,Mingyang Shan,Craig Mallinckrodt 机构:Mallinckrodt, Eli Lilly and Company, Indianapolis, Indiana, USA, Bayer, Montreal, QC, Canada, CSL Behring, King of Prussia, PA, USA, Cortexyme, San Francisco, CA, USA, Correspondence, Indianapolis, IN , USA 摘要:ICH E9(R1)补遗(2019)建议将主要分层(PS)作为处理并发事件的五种策略之一。因此,了解PS的优势、局限性和假设对于广泛的临床试验者社区非常重要。在PS的总体框架下,在不同的研究领域已经开发了许多方法,包括实验和观察研究。这些不同的应用程序使用了一组不同的工具和假设。因此,需要以统一的方式呈现这些方法。本教程的目标有三个。首先,我们对PS进行了连贯统一的描述。其次,我们强调PS内效应的估计依赖于强有力的假设,我们彻底检查这些假设的后果,以了解某些假设在何种情况下是合理的。最后,我们概述了PS分析的各种关键方法,并使用一个真实的临床试验示例来说明它们。补充材料中给出了一些实现这些方法的代码示例。 摘要:The ICH E9(R1) addendum (2019) proposed principal stratification (PS) as one of five strategies for dealing with intercurrent events. Therefore, understanding the strengths, limitations, and assumptions of PS is important for the broad community of clinical trialists. Many approaches have been developed under the general framework of PS in different areas of research, including experimental and observational studies. These diverse applications have utilized a diverse set of tools and assumptions. Thus, need exists to present these approaches in a unifying manner. The goal of this tutorial is threefold. First, we provide a coherent and unifying description of PS. Second, we emphasize that estimation of effects within PS relies on strong assumptions and we thoroughly examine the consequences of these assumptions to understand in which situations certain assumptions are reasonable. Finally, we provide an overview of a variety of key methods for PS analysis and use a real clinical trial example to illustrate them. Examples of code for implementation of some of these approaches are given in supplemental materials.

【19】 Posterior Predictive Null Checks 标题:后验预测零检验 链接:https://arxiv.org/abs/2112.03333

作者:Gemma E. Moran,John P. Cunningham,David M. Blei 机构:Data Science Institute, Columbia University, Department of Statistics, Columbia University, Department of Computer Science, Columbia University 摘要:贝叶斯模型批评是贝叶斯统计实践的重要组成部分。传统上,模型批评方法是基于预测检查、拟合优度测试与贝叶斯建模相适应,以及理解模型捕获数据分布情况的有效方法。然而,在现代实践中,研究人员反复构建和开发许多模型,探索模型空间以帮助解决手头的问题。虽然经典的预测检查可以帮助评估每一个模型,但它们不能帮助研究人员理解模型之间的相互关系。本文介绍了后验预测零检查(PPN),这是一种贝叶斯模型批评方法,有助于描述模型之间的关系。PPN背后的思想是检查来自一个模型的预测分布的数据是否能够通过为另一个模型设计的预测检查。这种形式的批评通过提供一种比较工具来补充经典的预测检查。我们称之为PPN研究的PPN集合可以帮助我们了解哪些模型是等效的,哪些模型提供了不同的数据视角。通过混合模型,我们展示了PPN研究以及传统的预测检查如何通过节俭原则帮助选择成分数量。通过概率因子模型,我们展示了PPN研究如何帮助理解不同类别模型之间的关系,例如线性模型和基于神经网络的模型。最后,我们分析了预测性检查文献中的数据,以展示PPN研究如何改进贝叶斯模型批评的实践。复制本文中结果的代码可从url获得{https://github.com/gemoran/ppn-code}. 摘要:Bayesian model criticism is an important part of the practice of Bayesian statistics. Traditionally, model criticism methods have been based on the predictive check, an adaptation of goodness-of-fit testing to Bayesian modeling and an effective method to understand how well a model captures the distribution of the data. In modern practice, however, researchers iteratively build and develop many models, exploring a space of models to help solve the problem at hand. While classical predictive checks can help assess each one, they cannot help the researcher understand how the models relate to each other. This paper introduces the posterior predictive null check (PPN), a method for Bayesian model criticism that helps characterize the relationships between models. The idea behind the PPN is to check whether data from one model's predictive distribution can pass a predictive check designed for another model. This form of criticism complements the classical predictive check by providing a comparative tool. A collection of PPNs, which we call a PPN study, can help us understand which models are equivalent and which models provide different perspectives on the data. With mixture models, we demonstrate how a PPN study, along with traditional predictive checks, can help select the number of components by the principle of parsimony. With probabilistic factor models, we demonstrate how a PPN study can help understand relationships between different classes of models, such as linear models and models based on neural networks. Finally, we analyze data from the literature on predictive checks to show how a PPN study can improve the practice of Bayesian model criticism. Code to replicate the results in this paper is available at url{https://github.com/gemoran/ppn-code}.

【20】 Bayesian Structural Equation Modeling in Multiple Omics Data Integration with Application to Circadian Genes 标题:多OMICS数据集成中的贝叶斯结构方程建模及其在昼夜节律基因中的应用 链接:https://arxiv.org/abs/2112.03330

作者:Arnab Kumar Maity,Sang Chan Lee,Bani K. Mallick,Tapasree Roy Sarkar 机构:Sarkar , Early Clinical Development Oncology Statistics, Pfizer Inc., San Diego, USA, Department of Statistics, Texas A&M University, College Station, USA, and, Department of Biology, Texas A&M University, College Station, USA. 备注:None 摘要:众所周知,不同数据源之间的整合是可靠的,因为它有可能揭示基因组表达的新功能,这些功能可能在单一数据源分析中处于休眠状态。此外,不同的研究证明了对多平台数据进行更有力的分析是正确的。为此,在本研究中,我们考虑了昼夜节律基因的组学谱,如拷贝数变化和RNA序列数据以及它们的生存反应。我们开发了一个贝叶斯结构方程模型,结合线性回归和对数正态加速失效时间回归,以整合这两个平台之间的信息,预测受试者的生存率。我们在回归参数上放置共轭先验,并利用它们的条件分布导出吉布斯采样器。我们广泛的模拟研究表明,综合模型比其最接近的竞争对手更适合数据。最大的基因组学和转录组学数据库TCGA对胶质母细胞瘤癌症数据和乳腺癌数据的分析支持了我们的发现。开发的方法包装在R CRAN提供的R软件包semmcmc中。 摘要:It is well known that the integration among different data-sources is reliable because of its potential of unveiling new functionalities of the genomic expressions which might be dormant in a single source analysis. Moreover, different studies have justified the more powerful analyses of multi-platform data. Toward this, in this study, we consider the circadian genes' omics profile such as copy number changes and RNA sequence data along with their survival response. We develop a Bayesian structural equation modeling coupled with linear regressions and log normal accelerated failure time regression to integrate the information between these two platforms to predict the survival of the subjects. We place conjugate priors on the regression parameters and derive the Gibbs sampler using the conditional distributions of them. Our extensive simulation study shows that the integrative model provides a better fit to the data than its closest competitor. The analyses of glioblastoma cancer data and the breast cancer data from TCGA, the largest genomics and transcriptomics database, support our findings. The developed method is wrapped in R package semmcmc available at R CRAN.

【21】 Lattice-Based Methods Surpass Sum-of-Squares in Clustering 标题:基于格的聚类方法在聚类中的超越平方和 链接:https://arxiv.org/abs/2112.03898

作者:Ilias Zadik,Min Jae Song,Alexander S. Wein,Joan Bruna 机构:Department of Mathematics, Massachusetts Institute of Technology, Courant Institute of Mathematical Sciences, New York University, Simons Institute for the Theory of Computing, UC Berkeley, Center for Data Science, New York University 摘要:聚类是无监督学习中的一个基本原理,它产生了一类丰富的具有计算挑战性的推理任务。在这项工作中,我们专注于聚类未知(可能退化)协方差的$d$维高斯混合的规范任务。最近的工作(Ghosh et al.'20;Mao,Wein'21;Davis,Diaz,Wang'21)已经针对低次多项式方法和平方和(SoS)层次建立了下限,用于恢复高斯聚类实例中植入的某些隐藏结构。先前对许多类似推理任务的研究表明,这种下界强烈表明聚类存在固有的统计-计算差距,即聚类任务在统计上是可能的,但没有多项式时间算法成功。我们考虑的聚类任务的一个特殊情况等价于在其他随机子空间中找到一个种植的超立方体向量的问题。我们表明,也许令人惊讶的是,这个特定的聚类模型没有显示出统计到计算的差距,即使前面提到的低度和SoS下限继续适用于这种情况。为了实现这一点,我们给出了一种基于Lenstra--Lenstra--Lovasz格基约简方法的多项式时间算法,该算法实现了$d+1$样本的统计最优样本复杂度。这一结果扩展了一类问题,这些问题的推测统计到计算间隙可以通过“脆性”多项式时间算法“闭合”,突出了噪声在统计到计算间隙开始中的关键但微妙的作用。 摘要:Clustering is a fundamental primitive in unsupervised learning which gives rise to a rich class of computationally-challenging inference tasks. In this work, we focus on the canonical task of clustering $d$-dimensional Gaussian mixtures with unknown (and possibly degenerate) covariance. Recent works (Ghosh et al. '20; Mao, Wein '21; Davis, Diaz, Wang '21) have established lower bounds against the class of low-degree polynomial methods and the sum-of-squares (SoS) hierarchy for recovering certain hidden structures planted in Gaussian clustering instances. Prior work on many similar inference tasks portends that such lower bounds strongly suggest the presence of an inherent statistical-to-computational gap for clustering, that is, a parameter regime where the clustering task is extit{statistically} possible but no extit{polynomial-time} algorithm succeeds. One special case of the clustering task we consider is equivalent to the problem of finding a planted hypercube vector in an otherwise random subspace. We show that, perhaps surprisingly, this particular clustering model extit{does not exhibit} a statistical-to-computational gap, even though the aforementioned low-degree and SoS lower bounds continue to apply in this case. To achieve this, we give a polynomial-time algorithm based on the Lenstra--Lenstra--Lovasz lattice basis reduction method which achieves the statistically-optimal sample complexity of $d+1$ samples. This result extends the class of problems whose conjectured statistical-to-computational gaps can be "closed" by "brittle" polynomial-time algorithms, highlighting the crucial but subtle role of noise in the onset of statistical-to-computational gaps.

【22】 Efficient Calibration of Multi-Agent Market Simulators from Time Series with Bayesian Optimization 标题:基于贝叶斯优化的时间序列多智能体市场模拟器的有效校准 链接:https://arxiv.org/abs/2112.03874

作者:Yuanlu Bai,Henry Lam,Svitlana Vyetrenko,Tucker Balch 机构:Columbia University, USA, J.P.Morgan AI Research, USA 摘要:多代理市场模拟通常用于为下游机器学习或强化学习任务创建环境,例如在将交易策略部署到实时交易之前对其进行训练或测试。在电子交易市场中,通常只能直接观察到由多个市场参与者相互作用产生的价格或交易量时间序列。因此,需要校准多智能体市场环境,以便模拟智能体交互产生的时间序列类似于历史-这相当于解决一个高度复杂的大规模优化问题。在本文中,我们提出了一个简单而有效的框架,用于根据历史时间序列观测值校准多智能体市场模拟器参数。首先,我们考虑一个新的资格设置的概念绕过潜在的不可识别性问题。其次,我们推广了带有Bonferroni校正的两样本Kolmogorov-Smirnov(K-S)检验来检验两个高维时间序列分布之间的相似性,这给出了一个简单但有效的时间序列样本集之间的距离度量。第三,我们建议使用贝叶斯优化(BO)和信赖域BO(TuRBO)来最小化上述距离度量。最后,我们通过数值实验证明了该框架的有效性。 摘要:Multi-agent market simulation is commonly used to create an environment for downstream machine learning or reinforcement learning tasks, such as training or testing trading strategies before deploying them to real-time trading. In electronic trading markets only the price or volume time series, that result from interaction of multiple market participants, are typically directly observable. Therefore, multi-agent market environments need to be calibrated so that the time series that result from interaction of simulated agents resemble historical -- which amounts to solving a highly complex large-scale optimization problem. In this paper, we propose a simple and efficient framework for calibrating multi-agent market simulator parameters from historical time series observations. First, we consider a novel concept of eligibility set to bypass the potential non-identifiability issue. Second, we generalize the two-sample Kolmogorov-Smirnov (K-S) test with Bonferroni correction to test the similarity between two high-dimensional time series distributions, which gives a simple yet effective distance metric between the time series sample sets. Third, we suggest using Bayesian optimization (BO) and trust-region BO (TuRBO) to minimize the aforementioned distance metric. Finally, we demonstrate the efficiency of our framework using numerical experiments.

【23】 Nonparametric Treatment Effect Identification in School Choice 标题:择校中的非参数处理效果识别 链接:https://arxiv.org/abs/2112.03872

作者:Jiafeng Chen 机构:NONPARAMETRIC TREATMENT EFFECT IDENTIFICATIONIN SCHOOL CHOICEJIAFENG CHENAbstract, We study identification and estimation of treatment effects in common schoolchoice settings 备注:Presented at SOLE 2021 摘要:我们研究在普通学校选择环境中,在个体潜在结果不受限制的异质性下,治疗效果的识别和评估。我们提出了两个识别概念,分别对应于基于设计和基于抽样的不确定性。我们描述了为各种学校选择机制所确定的一组因果估计,包括随机和非随机打破联系的机制;我们讨论它们的政策含义。我们还研究了这些因果估计的非参数估计的渐近行为。最后,我们将我们的方法与Abdulkadiroglu、Angrist、Narita和Pathak(2017a,即将出版)中提出的倾向评分方法联系起来,并在完全异质治疗效应下得出后一种方法的内隐估计。 摘要:We study identification and estimation of treatment effects in common school choice settings, under unrestricted heterogeneity in individual potential outcomes. We propose two notions of identification, corresponding to design- and sampling-based uncertainty, respectively. We characterize the set of causal estimands that are identified for a large variety of school choice mechanisms, including ones that feature both random and non-random tie-breaking; we discuss their policy implications. We also study the asymptotic behavior of nonparametric estimators for these causal estimands. Lastly, we connect our approach to the propensity score approach proposed in Abdulkadiroglu, Angrist, Narita, and Pathak (2017a, forthcoming), and derive the implicit estimands of the latter approach, under fully heterogeneous treatment effects.

【24】 On the Effectiveness of Mode Exploration in Bayesian Model Averaging for Neural Networks 标题:神经网络贝叶斯模型平均中模式探索的有效性 链接:https://arxiv.org/abs/2112.03773

作者:John T. Holodnak,Allan B. Wollaber 机构: 1MassachusettsInstituteofTechnology 备注:Presented at the ICML 2021 Workshop on Uncertainty and Robustness in Deep Learning 摘要:在监督学习环境中使用深度神经网络产生校准预测概率的多种技术已经出现,这些技术利用各种方法来集成在循环训练或从多个随机起始点(深度集成)进行训练期间发现的各种解。然而,只有有限的工作调查了探索每个不同解决方案(后验模式)周围局部区域的效用。在CIFAR-10数据集上使用三种著名的深度体系结构,我们评估了几种探索权重空间局部区域的简单方法,包括Brier分数、精度和预期校准误差。我们考虑贝叶斯推理技术(变分推理和Hamiltonian Monte Carlo应用到SOFTMAX输出层),以及利用随机梯度下降轨迹接近Opima。虽然向集成中添加单独的模式可以均匀地提高性能,但我们表明,与没有模式探索的集成相比,这里考虑的简单模式探索方法几乎没有改进。 摘要:Multiple techniques for producing calibrated predictive probabilities using deep neural networks in supervised learning settings have emerged that leverage approaches to ensemble diverse solutions discovered during cyclic training or training from multiple random starting points (deep ensembles). However, only a limited amount of work has investigated the utility of exploring the local region around each diverse solution (posterior mode). Using three well-known deep architectures on the CIFAR-10 dataset, we evaluate several simple methods for exploring local regions of the weight space with respect to Brier score, accuracy, and expected calibration error. We consider both Bayesian inference techniques (variational inference and Hamiltonian Monte Carlo applied to the softmax output layer) as well as utilizing the stochastic gradient descent trajectory near optima. While adding separate modes to the ensemble uniformly improves performance, we show that the simple mode exploration methods considered here produce little to no improvement over ensembles without mode exploration.

【25】 Machine Learning in the Search for New Fundamental Physics 标题:机器学习在寻找新的基础物理中的应用 链接:https://arxiv.org/abs/2112.03769

作者:Georgia Karagiorgi,Gregor Kasieczka,Scott Kravitz,Benjamin Nachman,David Shih 机构:Department of Physics, Columbia University, New York, NY , USA, Institut für Experimentalphysik, Universität Hamburg, Hamburg, Germany, Physics Division, Lawrence Berkeley National Laboratory, Berkeley, CA , USA 备注:Preprint of article submitted to Nature Reviews Physics, 19 pages, 1 figure 摘要:机器学习在促进和加速新基础物理的探索方面起着至关重要的作用。我们回顾了机器学习方法的现状以及在地球高能物理实验中新物理搜索的应用,包括大型强子对撞机、稀有事件搜索和中微子实验。虽然机器学习在这些领域有着悠久的历史,但深度学习革命(2010年代早期)在研究范围和目标方面产生了质的变化。这些现代机器学习的发展是本综述的重点。 摘要:Machine learning plays a crucial role in enhancing and accelerating the search for new fundamental physics. We review the state of machine learning methods and applications for new physics searches in the context of terrestrial high energy physics experiments, including the Large Hadron Collider, rare event searches, and neutrino experiments. While machine learning has a long history in these fields, the deep learning revolution (early 2010s) has yielded a qualitative shift in terms of the scope and ambition of research. These modern machine learning developments are the focus of the present review.

【26】 A Continuous-time Stochastic Gradient Descent Method for Continuous Data 标题:连续数据的一种连续时间随机梯度下降法 链接:https://arxiv.org/abs/2112.03754

作者:Kexin Jin,Jonas Latz,Chenguang Liu,Carola-Bibiane Schönlieb 机构:Department of Mathematics, Princeton University, Princeton, NJ ,-, USA, School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh, EH,AS, United Kingdom, Delft Institute of Applied Mathematics, Technische Universiteit Delft, Delft, The Netherlands 摘要:具有连续数据的优化问题出现在鲁棒机器学习、功能数据分析和变分推理等领域。这里,目标函数作为一系列(连续)索引目标函数上的积分给出——相对于概率测度积分。这类问题通常可以通过随机优化方法来解决:使用随机切换的索引对索引目标函数执行优化步骤。在这项工作中,我们研究了连续数据优化问题的随机梯度下降算法的连续时间变量。这种所谓的随机梯度过程包含一个梯度流,该梯度流最小化一个指数目标函数,该函数与确定指数的连续时间指数过程耦合。指数过程是,例如,在紧致空间上的反射扩散、纯跳跃过程或其他Léevy过程。因此,我们研究了连续数据空间的多个采样模式,并允许在算法运行时模拟或传输数据。我们分析了随机梯度过程的逼近性质,并研究了它在恒定和递减学习率下的长期行为和遍历性。最后,我们举例说明了随机梯度过程在含噪声函数数据的多项式回归问题以及物理信息神经网络中的适用性。 摘要:Optimization problems with continuous data appear in, e.g., robust machine learning, functional data analysis, and variational inference. Here, the target function is given as an integral over a family of (continuously) indexed target functions - integrated with respect to a probability measure. Such problems can often be solved by stochastic optimization methods: performing optimization steps with respect to the indexed target function with randomly switched indices. In this work, we study a continuous-time variant of the stochastic gradient descent algorithm for optimization problems with continuous data. This so-called stochastic gradient process consists in a gradient flow minimizing an indexed target function that is coupled with a continuous-time index process determining the index. Index processes are, e.g., reflected diffusions, pure jump processes, or other L'evy processes on compact spaces. Thus, we study multiple sampling patterns for the continuous data space and allow for data simulated or streamed at runtime of the algorithm. We analyze the approximation properties of the stochastic gradient process and study its longtime behavior and ergodicity under constant and decreasing learning rates. We end with illustrating the applicability of the stochastic gradient process in a polynomial regression problem with noisy functional data, as well as in a physics-informed neural network.

【27】 Tell me why! -- Explanations support learning of relational and causal structure 标题:告诉我为什么!--解释有助于学习关系结构和因果结构 链接:https://arxiv.org/abs/2112.03753

作者:Andrew K. Lampinen,Nicholas A. Roy,Ishita Dasgupta,Stephanie C. Y. Chan,Allison C. Tam,James L. McClelland,Chen Yan,Adam Santoro,Neil C. Rabinowitz,Jane X. Wang,Felix Hill 机构:DeepMind, London, UK 备注:22 pages 摘要:解释在人类学习中扮演着相当重要的角色,特别是在人工智能仍然面临重大挑战的领域——形成抽象,学习世界的关系和因果结构。在这里,我们探讨强化学习代理是否同样可以从解释中获益。我们概述了一系列关系任务,这些任务涉及选择一个对象,该对象是集合中的奇数对象(即,沿着许多可能的特征维度中的一个维度是唯一的)。奇数一次性任务要求代理对一组对象之间的多维关系进行推理。我们表明,代理人不能仅从奖励中很好地学习这些任务,但当他们还接受了生成解释对象属性或选择正确或不正确原因的语言的训练时,他们的绩效达到90%以上。在进一步的实验中,我们展示了预测解释如何使代理人能够从模棱两可、因果混淆的训练中适当地概括,甚至元学习执行实验干预以识别因果结构。我们表明,解释有助于克服代理人专注于简单特征的倾向,并探索解释的哪些方面使其最为有益。我们的结果表明,从解释中学习是一个强大的原则,可以为训练更健壮和通用的机器学习系统提供一条有希望的途径。 摘要:Explanations play a considerable role in human learning, especially in areas thatremain major challenges for AI -- forming abstractions, and learning about the re-lational and causal structure of the world. Here, we explore whether reinforcement learning agents might likewise benefit from explanations. We outline a family of relational tasks that involve selecting an object that is the odd one out in a set (i.e., unique along one of many possible feature dimensions). Odd-one-out tasks require agents to reason over multi-dimensional relationships among a set of objects. We show that agents do not learn these tasks well from reward alone, but achieve >90% performance when they are also trained to generate language explaining object properties or why a choice is correct or incorrect. In further experiments, we show how predicting explanations enables agents to generalize appropriately from ambiguous, causally-confounded training, and even to meta-learn to perform experimental interventions to identify causal structure. We show that explanations help overcome the tendency of agents to fixate on simple features, and explore which aspects of explanations make them most beneficial. Our results suggest that learning from explanations is a powerful principle that could offer a promising path towards training more robust and general machine learning systems.

【28】 Interpolating between BSDEs and PINNs -- deep learning for elliptic and parabolic boundary value problems 标题:BSDE和PINN之间的插值--椭圆型和抛物型边值问题的深度学习 链接:https://arxiv.org/abs/2112.03749

作者:Nikolas Nüsken,Lorenz Richter 机构:Institute of Mathematics, Brandenburgische Technische Universität Cottbus-Senftenberg, Cottbus, Germany, dida Datenschmiede GmbH, Berlin, Germany 摘要:求解高维偏微分方程是经济学、科学和工程领域经常遇到的挑战。近年来,人们发展了大量的计算方法,其中大部分依赖于蒙特卡罗抽样和基于深度学习的近似相结合。对于椭圆型和抛物型问题,现有的方法大致可分为基于$ extit{倒向随机微分方程}$(BSDE)的重新表述的方法和旨在最小化回归类型$L^2$-误差的方法($ extit{物理信息神经网络}$,PINNs)。在本文中,我们回顾了文献,并提出了一种基于新型$ extit{diffusion loss}$的方法,该方法在BSDE和PINN之间插值。我们的贡献为统一理解高维偏微分方程的数值方法以及结合BSDE和PINN优势的实现打开了大门。我们还对本征值问题进行了推广,并进行了广泛的数值研究,包括计算非线性薛定谔算符的基态和与分子动力学相关的committor函数。 摘要:Solving high-dimensional partial differential equations is a recurrent challenge in economics, science and engineering. In recent years, a great number of computational approaches have been developed, most of them relying on a combination of Monte Carlo sampling and deep learning based approximation. For elliptic and parabolic problems, existing methods can broadly be classified into those resting on reformulations in terms of $ extit{backward stochastic differential equations}$ (BSDEs) and those aiming to minimize a regression-type $L^2$-error ($ extit{physics-informed neural networks}$, PINNs). In this paper, we review the literature and suggest a methodology based on the novel $ extit{diffusion loss}$ that interpolates between BSDEs and PINNs. Our contribution opens the door towards a unified understanding of numerical approaches for high-dimensional PDEs, as well as for implementations that combine the strengths of BSDEs and PINNs. We also provide generalizations to eigenvalue problems and perform extensive numerical studies, including calculations of the ground state for nonlinear Schr"odinger operators and committor functions relevant in molecular dynamics.

【29】 A more efficient algorithm to compute the Rand Index for change-point problems 标题:求解变点问题的一种更有效的Rand指数算法 链接:https://arxiv.org/abs/2112.03738

作者:Lucas de Oliveira Prates 摘要:在本文中,我们提供了一种更有效的算法来计算数据簇来自变化点检测问题时的兰德指数。给定$N$数据点和大小为$r$和$s$的两个簇,该算法运行在$O(r+s)$时间复杂度和$O(1)$内存复杂度上。相比之下,传统算法运行在$O(rs+N)上$时间复杂度和$O(rs)$内存复杂度。 摘要:In this paper we provide a more efficient algorithm to compute the Rand Index when the data cluster comes from change-point detection problems. Given $N$ data points and two clusters of size $r$ and $s$, the algorithm runs on $O(r+s)$ time complexity and $O(1)$ memory complexity. The traditional algorithm, in contrast, runs on $O(rs+N)$ time complexity and $O(rs)$ memory complexity.

【30】 A Bayesian take on option pricing with Gaussian processes 标题:高斯过程下期权定价的贝叶斯方法 链接:https://arxiv.org/abs/2112.03718

作者:Martin Tegner,Stephen Roberts 机构:Department of Engineering & Oxford-Man Institute, University of Oxford, Oxford, OX,PJ 备注:arXiv admin note: text overlap with arXiv:1901.06021 摘要:局部波动率由于其依赖于状态的扩散系数而成为一种通用的期权定价模型。然而,由于它涉及到提出潜在函数的假设模型和将其拟合到数据的方法,因此校准是非常重要的。在本文中,我们提出了一种新的基于高斯过程先验的贝叶斯推断。我们获得了丰富的经验公式局部波动率函数的表示,带有不确定性的概率概念。我们提出了一种推理算法,并将我们的方法应用于标准普尔500指数市场数据。 摘要:Local volatility is a versatile option pricing model due to its state dependent diffusion coefficient. Calibration is, however, non-trivial as it involves both proposing a hypothesis model of the latent function and a method for fitting it to data. In this paper we present novel Bayesian inference with Gaussian process priors. We obtain a rich representation of the local volatility function with a probabilistic notion of uncertainty attached to the calibrate. We propose an inference algorithm and apply our approach to S&P 500 market data.

【31】 Noise Distribution Adaptive Self-Supervised Image Denoising using Tweedie Distribution and Score Matching 标题:基于Twedie分布和分数匹配的噪声分布自适应自监督图像去噪 链接:https://arxiv.org/abs/2112.03696

作者:Kwanyoung Kim,Taesung Kwon,Jong Chul Ye 机构: Department of Bio and Brain Engineering, Kim Jaechul Graduate School of AI, Deptartment of Mathematical Sciences, Korea Advanced Institute of Science and Technology (KAIST) 摘要:Tweedie分布是指数色散模型的一种特例,在经典统计学中常用作广义线性模型的分布。在这里,我们揭示了Tweedie分布在现代深度学习时代也起着关键作用,导致了一个独立于分布的自监督图像去噪公式,而没有干净的参考图像。具体地说,通过结合最近的Noise2Score自监督图像去噪方法和Tweedie分布的鞍点近似,我们可以提供一个通用的封闭形式去噪公式,该公式可用于大类噪声分布,而不需要知道潜在的噪声分布。与原始Noise2Score相似,新方法由两个连续步骤组成:使用扰动噪声图像进行分数匹配,然后通过分布无关的Tweedie公式得到封闭形式的图像去噪公式。这也提出了一个系统的算法来估计噪声模型和噪声参数为给定的噪声图像数据集。通过大量实验,我们证明了该方法能够准确估计噪声模型和参数,并在基准数据集和真实数据集上提供了最先进的自监督图像去噪性能。 摘要:Tweedie distributions are a special case of exponential dispersion models, which are often used in classical statistics as distributions for generalized linear models. Here, we reveal that Tweedie distributions also play key roles in modern deep learning era, leading to a distribution independent self-supervised image denoising formula without clean reference images. Specifically, by combining with the recent Noise2Score self-supervised image denoising approach and the saddle point approximation of Tweedie distribution, we can provide a general closed-form denoising formula that can be used for large classes of noise distributions without ever knowing the underlying noise distribution. Similar to the original Noise2Score, the new approach is composed of two successive steps: score matching using perturbed noisy images, followed by a closed form image denoising formula via distribution-independent Tweedie's formula. This also suggests a systematic algorithm to estimate the noise model and noise parameters for a given noisy image data set. Through extensive experiments, we demonstrate that the proposed method can accurately estimate noise models and parameters, and provide the state-of-the-art self-supervised image denoising performance in the benchmark dataset and real-world dataset.

【32】 Scaling Structured Inference with Randomization 标题:基于随机化的伸缩结构化推理 链接:https://arxiv.org/abs/2112.03638

作者:Yao Fu,Mirella Lapata 机构:Traditional dynamic programming based inference for ex-ponential families has limited scalability with large combi- 1Institute for Language 备注:Preprint 摘要:在深度学习时代,离散图形模型的状态空间规模对于模型能力至关重要。现有的基于动态规划(DP)的推理通常适用于少量状态(通常少于数百个)。在这项工作中,我们提出了一系列随机动态规划(RDP)算法,用于将结构化模型扩展到成千上万个潜在状态。我们的方法广泛适用于经典的基于DP的推理(划分、边缘、再参数化、熵等)和不同的图结构(链、树和更一般的超图)。它还与自动微分兼容,因此可以与神经网络无缝集成,并使用基于梯度的优化器进行学习。我们的核心技术是随机化,即限制和重新加权一小部分节点上的DP,从而将计算量减少几个数量级。通过Rao Blackwellization和重要性抽样,我们进一步实现了低偏差和方差。在不同图形上的不同推理实验证明了我们方法的准确性和有效性。此外,当使用RDP训练规模结构VAE时,它在测试可能性方面优于基线,并成功地防止了后塌陷。 摘要:The scale of the state space of discrete graphical models is crucial for model capacity in the era of deep learning. Existing dynamic programming (DP) based inference typically works with a small number of states (usually less than hundreds). In this work, we propose a family of randomized dynamic programming (RDP) algorithms for scaling structured models to tens of thousands of latent states. Our method is widely applicable to classical DP-based inference (partition, marginal, reparameterization, entropy, .etc) and different graph structures (chains, trees, and more general hypergraphs). It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly and learned with gradient-based optimizers. Our core technique is randomization, which is to restrict and reweight DP on a small selected subset of nodes, leading to computation reduction by orders of magnitudes. We further achieve low bias and variance with Rao-Blackwellization and importance sampling. Experiments on different inferences over different graphs demonstrate the accuracy and efficiency of our methods. Furthermore, when using RDP to train a scaled structured VAE, it outperforms baselines in terms of test likelihood and successfully prevents posterior collapse.

【33】 Federated Causal Discovery 标题:联合因果发现 链接:https://arxiv.org/abs/2112.03555

作者:Erdun Gao,Junjia Chen,Li Shen,Tongliang Liu,Mingming Gong,Howard Bondell 机构:†The University of Melbourne, ‡Xi’an Jiaotong University, ⋄JD Explore Academy, §The University of Sydney 摘要:因果发现旨在从观测数据中学习因果图。迄今为止,大多数因果发现方法都需要将数据存储在中央服务器中。然而,数据所有者逐渐拒绝共享他们的个性化数据,以避免隐私泄露,这使得这项任务更加麻烦,因为切断了第一步。出现了一个难题:$ extit{我们如何从分散数据中推断因果关系?}$在本文中,在数据的加性噪声模型假设下,我们迈出了开发基于梯度的学习框架的第一步,名为DAG共享联合因果发现(DS-FCD),它可以在不直接接触本地数据的情况下学习因果图,并自然处理数据异构性。DS-FCD得益于每个本地模型的两级结构。第一级学习因果图并与服务器通信以从其他客户机获取模型信息,而第二级近似因果机制并根据自己的数据进行个人更新以适应数据异构性。此外,DS-FCD利用等式非循环约束将整个学习任务描述为一个连续优化问题,这可以通过梯度下降法自然解决。在合成数据集和真实数据集上的大量实验验证了该方法的有效性。 摘要:Causal discovery aims to learn a causal graph from observational data. To date, most causal discovery methods require data to be stored in a central server. However, data owners gradually refuse to share their personalized data to avoid privacy leakage, making this task more troublesome by cutting off the first step. A puzzle arises: $ extit{how do we infer causal relations from decentralized data?}$ In this paper, with the additive noise model assumption of data, we take the first step in developing a gradient-based learning framework named DAG-Shared Federated Causal Discovery (DS-FCD), which can learn the causal graph without directly touching local data and naturally handle the data heterogeneity. DS-FCD benefits from a two-level structure of each local model. The first level learns the causal graph and communicates with the server to get model information from other clients, while the second level approximates causal mechanisms and personally updates from its own data to accommodate the data heterogeneity. Moreover, DS-FCD formulates the overall learning task as a continuous optimization problem by taking advantage of an equality acyclicity constraint, which can be naturally solved by gradient descent methods. Extensive experiments on both synthetic and real-world datasets verify the efficacy of the proposed method.

【34】 Generative Adversarial Networks for Labeled Data Creation for Structural Damage Detection 标题:用于结构损伤检测标签数据生成的生成式对抗性网络 链接:https://arxiv.org/abs/2112.03478

作者:Furkan Luleci,F. Necati Catbas,Onur Avci 机构:Department of Civil, Environmental, and Construction Engineering, University of Central Florida, Orlando, FL, USA, Department of Civil, Construction, and Environmental Engineering, Iowa State University, Ames, IA 摘要:在过去的几十年里,数据科学领域取得了巨大的进步,其他学科也不断从中受益。结构健康监测(SHM)是利用人工智能(AI)如机器学习(ML)和深度学习(DL)算法,根据收集的数据对土木结构进行状态评估的领域之一。ML和DL方法需要大量数据用于训练程序;然而,在SHM中,从土建结构收集的数据非常详尽;尤其是获取有用的数据(与损伤相关的数据)可能非常具有挑战性。本文采用梯度惩罚的一维瓦瑟斯坦深度卷积生成对抗网络(1-D WDCGAN-GP)生成综合标记振动数据。然后,利用1-D深卷积神经网络(1-D DCNN)对不同水平的综合增强振动数据集进行结构损伤检测。损伤检测结果表明,一维WDCGAN-GP可以成功地解决基于振动的土木结构损伤诊断中的数据不足问题。关键词:结构健康监测(SHM)、结构损伤诊断、结构损伤检测、1-D深卷积神经网络(1-D DCNN)、1-D生成对抗网络(1-D GAN)、深卷积生成对抗网络(DCGAN)、带梯度惩罚的Wasserstein生成对抗网络(WGAN-GP) 摘要:There has been a drastic progression in the field of Data Science in the last few decades and other disciplines have been continuously benefitting from it. Structural Health Monitoring (SHM) is one of those fields that use Artificial Intelligence (AI) such as Machine Learning (ML) and Deep Learning (DL) algorithms for condition assessment of civil structures based on the collected data. The ML and DL methods require plenty of data for training procedures; however, in SHM, data collection from civil structures is very exhaustive; particularly getting useful data (damage associated data) can be very challenging. This paper uses 1-D Wasserstein Deep Convolutional Generative Adversarial Networks using Gradient Penalty (1-D WDCGAN-GP) for synthetic labeled vibration data generation. Then, implements structural damage detection on different levels of synthetically enhanced vibration datasets by using 1-D Deep Convolutional Neural Network (1-D DCNN). The damage detection results show that the 1-D WDCGAN-GP can be successfully utilized to tackle data scarcity in vibration-based damage diagnostics of civil structures. Keywords: Structural Health Monitoring (SHM), Structural Damage Diagnostics, Structural Damage Detection, 1-D Deep Convolutional Neural Networks (1-D DCNN), 1-D Generative Adversarial Networks (1-D GAN), Deep Convolutional Generative Adversarial Networks (DCGAN), Wasserstein Generative Adversarial Networks with Gradient Penalty (WGAN-GP)

【35】 Spectral Complexity-scaled Generalization Bound of Complex-valued Neural Networks 标题:复值神经网络的谱复杂度泛化界 链接:https://arxiv.org/abs/2112.03467

作者:Haowen Chen,Fengxiang He,Shiye Lei,Dacheng Tao 机构: University of Hong Kong 摘要:复值神经网络(CVNNs)在信号处理和图像识别等领域有着广泛的应用。然而,很少有工作关注于CVNN的泛化,尽管这对于确保CVNN在未知数据上的性能至关重要。本文首次证明了复值神经网络的推广界。有界标度具有谱复杂度,其主导因子是权矩阵的谱范数积。此外,当训练数据是连续的时,我们的工作为CVNN提供了一个泛化界,这也受谱复杂度的影响。理论上,这些边界是通过Maurey稀疏引理和Dudley熵积分推导出来的。根据经验,我们通过在不同的数据集上训练复值卷积神经网络来进行实验:MNIST、FashionMNIST、CIFAR-10、CIFAR-100、Tiny ImageNet和IMDB。Spearman的秩序相关系数和这些数据集上相应的p值有力地证明了网络的谱复杂度(通过权重矩阵谱范数乘积测量)与泛化能力具有统计显著相关性。 摘要:Complex-valued neural networks (CVNNs) have been widely applied to various fields, especially signal processing and image recognition. However, few works focus on the generalization of CVNNs, albeit it is vital to ensure the performance of CVNNs on unseen data. This paper is the first work that proves a generalization bound for the complex-valued neural network. The bound scales with the spectral complexity, the dominant factor of which is the spectral norm product of weight matrices. Further, our work provides a generalization bound for CVNNs when training data is sequential, which is also affected by the spectral complexity. Theoretically, these bounds are derived via Maurey Sparsification Lemma and Dudley Entropy Integral. Empirically, we conduct experiments by training complex-valued convolutional neural networks on different datasets: MNIST, FashionMNIST, CIFAR-10, CIFAR-100, Tiny ImageNet, and IMDB. Spearman's rank-order correlation coefficients and the corresponding p values on these datasets give strong proof that the spectral complexity of the network, measured by the weight matrices spectral norm product, has a statistically significant correlation with the generalization ability.

【36】 A Unified Framework for Multi-distribution Density Ratio Estimation 标题:一种多分布密度比估计的统一框架 链接:https://arxiv.org/abs/2112.03440

作者:Lantao Yu,Yujia Jin,Stefano Ermon 机构:Department of Computer Science, Stanford University, Department of Management Science and Engineering 摘要:二进制密度比估计(DRE),估计的比率PY1/PY2$给定其经验样本的问题,为许多国家的最先进的机器学习算法,如对比表示学习和协变量移位适应提供了基础。在这项工作中,我们考虑一个广义的设置,其中给定的样本从多个分布PY1, LDOTS,PYK $($ K>2美元),我们的目的是有效地估计所有的分布对之间的密度比。这种推广带来了重要的新应用,如估计多个随机变量之间的统计差异,如多分布$f$-散度,以及通过多重要性抽样进行偏差校正。然后,我们从Bregman散度最小化的角度发展了一个通用框架,其中每个严格凸多元函数都会导致多分布DRE的适当损失。此外,我们重新推导了多分布密度比估计和类概率估计之间的理论联系,证明了在多分布DRE中使用任何严格合适的带连接函数的评分规则组合的合理性。我们表明,我们的框架导致了严格概括二进制DRE中对应方法的方法,以及在各种下游任务中表现出类似或优异性能的新方法。 摘要:Binary density ratio estimation (DRE), the problem of estimating the ratio $p_1/p_2$ given their empirical samples, provides the foundation for many state-of-the-art machine learning algorithms such as contrastive representation learning and covariate shift adaptation. In this work, we consider a generalized setting where given samples from multiple distributions $p_1, ldots, p_k$ (for $k > 2$), we aim to efficiently estimate the density ratios between all pairs of distributions. Such a generalization leads to important new applications such as estimating statistical discrepancy among multiple random variables like multi-distribution $f$-divergence, and bias correction via multiple importance sampling. We then develop a general framework from the perspective of Bregman divergence minimization, where each strictly convex multivariate function induces a proper loss for multi-distribution DRE. Moreover, we rederive the theoretical connection between multi-distribution density ratio estimation and class probability estimation, justifying the use of any strictly proper scoring rule composite with a link function for multi-distribution DRE. We show that our framework leads to methods that strictly generalize their counterparts in binary DRE, as well as new methods that show comparable or superior performance on various downstream tasks.

【37】 First-Order Regret in Reinforcement Learning with Linear Function Approximation: A Robust Estimation Approach 标题:线性函数逼近强化学习中的一阶遗憾:一种稳健估计方法 链接:https://arxiv.org/abs/2112.03432

作者:Andrew Wagenmaker,Yifang Chen,Max Simchowitz,Simon S. Du,Kevin Jamieson 摘要:获得一阶后悔边界——后悔边界不是最坏情况,而是在给定实例上衡量最优策略的性能——是顺序决策中的一个核心问题。虽然这种界限存在于许多环境中,但在具有大状态空间的强化学习中却被证明是难以捉摸的。在这项工作中,我们解决了这一差距,并表明在具有大状态空间的强化学习(即线性MDP设置)中,有可能获得$mathcal{O}(sqrt{V_1^star K})$的遗憾缩放。这里$V_1^star$是最佳策略的值,$K$是剧集数。我们证明了现有的基于最小二乘估计的技术不足以获得这一结果,相反,我们开发了一种新的基于鲁棒Catoni均值估计的鲁棒自归一化浓度界,它可能具有独立的意义。 摘要:Obtaining first-order regret bounds -- regret bounds scaling not as the worst-case but with some measure of the performance of the optimal policy on a given instance -- is a core question in sequential decision-making. While such bounds exist in many settings, they have proven elusive in reinforcement learning with large state spaces. In this work we address this gap, and show that it is possible to obtain regret scaling as $mathcal{O}(sqrt{V_1^star K})$ in reinforcement learning with large state spaces, namely the linear MDP setting. Here $V_1^star$ is the value of the optimal policy and $K$ is the number of episodes. We demonstrate that existing techniques based on least squares estimation are insufficient to obtain this result, and instead develop a novel robust self-normalized concentration bound based on the robust Catoni mean estimator, which may be of independent interest.

【38】 Causal Analysis and Classification of Traffic Crash Injury Severity Using Machine Learning Algorithms 标题:基于机器学习算法的交通碰撞伤严重程度原因分析与分类 链接:https://arxiv.org/abs/2112.03407

作者:Meghna Chakraborty,Timothy Gates,Subhrajit Sinha 机构:Department of Civil and Environmental Engineering, Michigan State University, South Shaw, Pacific Northwest National Laboratory, Battelle Blvd, Richland, WA 摘要:应用非参数方法对交通事故进行损伤严重程度的因果分析和分类受到了有限的关注。本研究采用不同的机器学习技术,包括决策树(DT)、随机森林(RF)、极端梯度增强(XGBoost)和深度神经网络(DNN),提出了一个因果推断的方法框架,使用格兰杰因果关系分析和州际交通事故伤害严重程度分类。本研究中使用的数据是针对2014年至2019年间德克萨斯州所有州际公路上的交通事故获得的。建议的严重性分类方法的输出包括致命和严重伤害(KA)碰撞、非严重和可能伤害(BC)碰撞以及仅财产损失(PDO)碰撞的三类。格兰杰因果关系有助于确定影响碰撞严重性的最具影响力的因素,而基于学习的模型预测了性能不同的严重性等级。Granger因果关系分析的结果确定,限速、地面和天气条件、交通量、工作区的存在、工作区的工人和高占用率车辆(HOV)车道等是影响碰撞严重性的最重要因素。分类器的预测性能在不同类别中产生不同的结果。具体而言,虽然决策树和随机森林分类器分别为数据中最稀有的KA类的PDO和BC严重性提供了最大的性能,但深度神经网络分类器的性能优于所有其他算法,这很可能是由于其逼近非线性模型的能力。本研究有助于使用非参数方法对交通碰撞损伤严重程度进行因果分析和分类预测,这方面的知识非常有限。 摘要:Causal analysis and classification of injury severity applying non-parametric methods for traffic crashes has received limited attention. This study presents a methodological framework for causal inference, using Granger causality analysis, and injury severity classification of traffic crashes, occurring on interstates, with different machine learning techniques including decision trees (DT), random forest (RF), extreme gradient boosting (XGBoost), and deep neural network (DNN). The data used in this study were obtained for traffic crashes on all interstates across the state of Texas from a period of six years between 2014 and 2019. The output of the proposed severity classification approach includes three classes for fatal and severe injury (KA) crashes, non-severe and possible injury (BC) crashes, and property damage only (PDO) crashes. While Granger Causality helped identify the most influential factors affecting crash severity, the learning-based models predicted the severity classes with varying performance. The results of Granger causality analysis identified the speed limit, surface and weather conditions, traffic volume, presence of workzones, workers in workzones, and high occupancy vehicle (HOV) lanes, among others, as the most important factors affecting crash severity. The prediction performance of the classifiers yielded varying results across the different classes. Specifically, while decision tree and random forest classifiers provided the greatest performance for PDO and BC severities, respectively, for the KA class, the rarest class in the data, deep neural net classifier performed superior than all other algorithms, most likely due to its capability of approximating nonlinear models. This study contributes to the limited body of knowledge pertaining to causal analysis and classification prediction of traffic crash injury severity using non-parametric approaches.

【39】 Nested Hyperbolic Spaces for Dimensionality Reduction and Hyperbolic NN Design 标题:嵌套双曲空间降维与双曲神经网络设计 链接:https://arxiv.org/abs/2112.03402

作者:Xiran Fan,Chun-Hao Yang,Baba C. Vemuri 机构:Department of Statistics, National Taiwan University, Institute of Applied Mathematical Science, Department of CISE, University of Florida 备注:19 pages, 6 figures 摘要:双曲线神经网络由于能够有效地表示分层数据集,在最近的一段时间里受到了广泛的欢迎。开发这些网络的挑战在于嵌入空间即双曲空间的非线性。双曲空间是洛伦兹群的齐次黎曼流形。大多数现有方法(除了一些例外)使用局部线性化来定义各种操作,这些操作与欧氏空间中传统深度神经网络中使用的操作并行。在本文中,我们提出了一种新的完全双曲型神经网络,它使用了投影(嵌入)的概念,然后在双曲空间中使用了内在聚集和非线性。这里的新颖之处在于投影,该投影设计用于将数据投影到低维嵌入双曲空间,从而导致嵌套双曲空间表示独立用于降维。主要的理论贡献是在洛伦兹变换下证明了所提出的嵌入是等距的和等变的。该投影在计算上是有效的,因为它可以用简单的线性运算来表示,并且由于上述等变特性,它允许权重共享。嵌套双曲空间表示是我们网络的核心组成部分,因此,我们首先将该嵌套双曲空间表示与其他降维方法(如切线PCA、主测地分析(PGA)和HoroPCA)进行比较。基于这种等变嵌入,我们开发了一种新的全双曲图卷积神经网络结构来学习投影参数。最后,我们在几个公开的数据集上展示了我们网络的比较性能。 摘要:Hyperbolic neural networks have been popular in the recent past due to their ability to represent hierarchical data sets effectively and efficiently. The challenge in developing these networks lies in the nonlinearity of the embedding space namely, the Hyperbolic space. Hyperbolic space is a homogeneous Riemannian manifold of the Lorentz group. Most existing methods (with some exceptions) use local linearization to define a variety of operations paralleling those used in traditional deep neural networks in Euclidean spaces. In this paper, we present a novel fully hyperbolic neural network which uses the concept of projections (embeddings) followed by an intrinsic aggregation and a nonlinearity all within the hyperbolic space. The novelty here lies in the projection which is designed to project data on to a lower-dimensional embedded hyperbolic space and hence leads to a nested hyperbolic space representation independently useful for dimensionality reduction. The main theoretical contribution is that the proposed embedding is proved to be isometric and equivariant under the Lorentz transformations. This projection is computationally efficient since it can be expressed by simple linear operations, and, due to the aforementioned equivariance property, it allows for weight sharing. The nested hyperbolic space representation is the core component of our network and therefore, we first compare this ensuing nested hyperbolic space representation with other dimensionality reduction methods such as tangent PCA, principal geodesic analysis (PGA) and HoroPCA. Based on this equivariant embedding, we develop a novel fully hyperbolic graph convolutional neural network architecture to learn the parameters of the projection. Finally, we present experiments demonstrating comparative performance of our network on several publicly available data sets.

【40】 RafterNet: Probabilistic predictions in multi-response regression 标题:后网:多响应回归中的概率预测 链接:https://arxiv.org/abs/2112.03377

作者:Marius Hofert,Avinash Prasad,Mu Zhu 机构:-,- 摘要:介绍了一种在多响应回归问题中进行概率预测的全非参数方法。随机森林被用作每个响应变量的边际模型,并且,作为本研究的新贡献,多个响应变量之间的依赖性由生成性神经网络建模。这种随机森林、相应的经验边际残差分布和生成型神经网络的组合建模方法称为RafterNet。多个数据集用作示例,以证明该方法的灵活性及其对概率预测的影响。 摘要:A fully nonparametric approach for making probabilistic predictions in multi-response regression problems is introduced. Random forests are used as marginal models for each response variable and, as novel contribution of the present work, the dependence between the multiple response variables is modeled by a generative neural network. This combined modeling approach of random forests, corresponding empirical marginal residual distributions and a generative neural network is referred to as RafterNet. Multiple datasets serve as examples to demonstrate the flexibility of the approach and its impact for making probabilistic forecasts.

【41】 Quality control for more reliable integration of deep learning-based image segmentation into medical workflows 标题:将基于深度学习的图像分割更可靠地集成到医疗工作流中的质量控制 链接:https://arxiv.org/abs/2112.03277

作者:Elena Williams,Sebastian Niehaus,Janis Reinelt,Alberto Merola,Paul Glad Mihai,Ingo Roeder,Nico Scherf,Maria del C. Valdés Hernández 机构: – AICURA medical, Bessemerstrasse , Berlin, Germany., – Centre for Clinical Brain Sciences. University of Edinburgh., – Institute for Medical Informatics and Biometry, Technische Universität Dresden, Fetscherstrasse , Dresden, Germany. 备注:25 pages 摘要:机器学习算法是现代诊断辅助软件的基础,该软件在临床实践中,特别是在放射学中被证明是有价值的。然而,不准确主要是由于用于训练这些算法的临床样本有限,妨碍了它们在临床医生中的广泛适用性、接受度和认可度。我们分析了最先进的自动质量控制(QC)方法,这些方法可以在这些算法中实现,以估计其输出的确定性。我们在脑图像分割任务中验证了最有希望的方法,以识别磁共振成像数据中的白质高强度(WMH)。WMH是一种常见于成年中后期的小血管疾病,由于其大小和分布模式不同,对其进行分割尤其具有挑战性。我们的结果表明,不确定性聚合和骰子预测在该任务的故障检测中最有效。两种方法独立地将平均骰子从0.82提高到0.84。我们的工作揭示了QC方法如何帮助检测分割失败的案例,从而使自动分割更可靠,更适合临床实践。 摘要:Machine learning algorithms underpin modern diagnostic-aiding software, which has proved valuable in clinical practice, particularly in radiology. However, inaccuracies, mainly due to the limited availability of clinical samples for training these algorithms, hamper their wider applicability, acceptance, and recognition amongst clinicians. We present an analysis of state-of-the-art automatic quality control (QC) approaches that can be implemented within these algorithms to estimate the certainty of their outputs. We validated the most promising approaches on a brain image segmentation task identifying white matter hyperintensities (WMH) in magnetic resonance imaging data. WMH are a correlate of small vessel disease common in mid-to-late adulthood and are particularly challenging to segment due to their varied size, and distributional patterns. Our results show that the aggregation of uncertainty and Dice prediction were most effective in failure detection for this task. Both methods independently improved mean Dice from 0.82 to 0.84. Our work reveals how QC methods can help to detect failed segmentation cases and therefore make automatic segmentation more reliable and suitable for clinical practice.

【42】 Toward a Taxonomy of Trust for Probabilistic Machine Learning 标题:面向概率机器学习的信任分类研究 链接:https://arxiv.org/abs/2112.03270

作者:Tamara Broderick,Andrew Gelman,Rachael Meager,Anna L. Smith,Tian Zheng 机构:Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology;, Department of Statistics, Columbia University;, Department of Political Science, Columbia University; 备注:18 pages, 2 figures 摘要:概率机器学习越来越多地为医学、经济、政治等领域的关键决策提供信息。我们需要证据来证明最终的决定是有根据的。为了帮助在这些决策中建立信任,我们开发了一种分类法,描述了分析中的信任可以分解的地方:(1)将现实世界的目标转化为特定可用训练数据集上的目标,(2)将训练数据上的抽象目标转化为具体的数学问题,(3)在使用算法解决所述数学问题时,(4)在使用特定代码实现所选算法时。我们详细说明了信任如何在每一步都会失败,并通过两个案例研究说明了我们的分类:小额信贷的有效性分析和《经济学人》对2020年美国总统选举的预测。最后,我们描述了各种各样的方法,这些方法可用于在分类的每个步骤中增加信任。使用我们的分类法突出了现有信任研究工作集中的步骤,以及建立信任特别具有挑战性的步骤。 摘要:Probabilistic machine learning increasingly informs critical decisions in medicine, economics, politics, and beyond. We need evidence to support that the resulting decisions are well-founded. To aid development of trust in these decisions, we develop a taxonomy delineating where trust in an analysis can break down: (1) in the translation of real-world goals to goals on a particular set of available training data, (2) in the translation of abstract goals on the training data to a concrete mathematical problem, (3) in the use of an algorithm to solve the stated mathematical problem, and (4) in the use of a particular code implementation of the chosen algorithm. We detail how trust can fail at each step and illustrate our taxonomy with two case studies: an analysis of the efficacy of microcredit and The Economist's predictions of the 2020 US presidential election. Finally, we describe a wide variety of methods that can be used to increase trust at each step of our taxonomy. The use of our taxonomy highlights steps where existing research work on trust tends to concentrate and also steps where establishing trust is particularly challenging.

【43】 Breaking the Convergence Barrier: Optimization via Fixed-Time Convergent Flows 标题:打破“屏障”的融合:通过定时融合流进行优化 链接:https://arxiv.org/abs/2112.01363

作者:Param Budhraja,Mayank Baranwal,Kunal Garg,Ashish Hota 机构: Indian Institute of Technology Kharagpur, Tata Consultancy Services Research & Innovation, Mumbai, University of California, Santa Cruz 备注:Accepted at AAAI Conference on Artificial Intelligence, 2022, to appear 摘要:加速梯度法是机器学习和其他数据分析领域中自然产生的大规模数据驱动优化问题的基础。我们引入了一个基于梯度的优化框架来实现加速,基于最近引入的动态系统的固定时间稳定性的概念。该方法是简单的基于梯度的方法的推广,可适当缩放,以在固定时间内收敛到优化器,与初始化无关。我们首先利用连续时间框架来设计固定时间稳定的动态系统,然后提供一致的离散化策略,使得等效离散时间算法在实际固定的迭代次数内跟踪优化器,从而实现这一点。我们还从理论上分析了所提出的梯度流的收敛性,以及它们对一系列服从强凸性、严格凸性和可能非凸性但满足Polyak-{L}ojasiewicz不等式的函数的加性扰动的鲁棒性。我们还证明了由于固定时间收敛,收敛速度上的遗憾界是常数。超参数具有直观的解释,并且可以进行调整,以符合所需收敛速度的要求。我们通过一系列数值算例验证了所提格式的加速收敛性,并与最新的优化算法进行了比较。我们的工作为通过连续时间流的离散化开发新的优化算法提供了见解。 摘要:Accelerated gradient methods are the cornerstones of large-scale, data-driven optimization problems that arise naturally in machine learning and other fields concerning data analysis. We introduce a gradient-based optimization framework for achieving acceleration, based on the recently introduced notion of fixed-time stability of dynamical systems. The method presents itself as a generalization of simple gradient-based methods suitably scaled to achieve convergence to the optimizer in a fixed-time, independent of the initialization. We achieve this by first leveraging a continuous-time framework for designing fixed-time stable dynamical systems, and later providing a consistent discretization strategy, such that the equivalent discrete-time algorithm tracks the optimizer in a practically fixed number of iterations. We also provide a theoretical analysis of the convergence behavior of the proposed gradient flows, and their robustness to additive disturbances for a range of functions obeying strong convexity, strict convexity, and possibly nonconvexity but satisfying the Polyak-{L}ojasiewicz inequality. We also show that the regret bound on the convergence rate is constant by virtue of the fixed-time convergence. The hyperparameters have intuitive interpretations and can be tuned to fit the requirements on the desired convergence rates. We validate the accelerated convergence properties of the proposed schemes on a range of numerical examples against the state-of-the-art optimization algorithms. Our work provides insights on developing novel optimization algorithms via discretization of continuous-time flows.

机器翻译,仅供参考