zl程序教程

您现在的位置是:首页 >  其他

当前栏目

顺序推荐的学习强制动态表示

2023-04-18 14:49:26 时间

近年来,顺序推荐系统对于解决许多在线服务中的信息过载具有重要意义。目前的顺序推荐方法侧重于在任何时候为每个用户学习固定数量的表示,并为用户使用单一表示或多利益表示。然而,当用户在电子商务推荐系统上的项目时,该用户的兴趣数量可能会随着时间的推移而变化(例如增加/减少一个兴趣),受用户不断发展的自我需求的影响。此外,不同的用户可能有不同数量的兴趣。本文认为,探索一个个性化的动态的用户兴趣数,并相应地学习一个动态的用户兴趣表示组是有意义的。我们为推荐系统提出了一个具有推荐系统(RDRSR)动态兴趣表示数的增强顺序模型。具体来说,RDRSR由动态利益鉴别器(DID)模块和动态利益分配器(DIA)模块组成。DID模块通过学习双向自我注意和Gumbel-Softmax的整体顺序特征来探索用户感兴趣的数量。DIA模块将历史单击的项目分配到一组子序列中,并构建用户的动态兴趣表示。我们将分配问题形式化为马尔可夫决策过程(MDP)的形式,并从每个项目的策略pi中抽取一个动作,以确定它属于哪个子序列。此外,在真实数据集上的实验证明了我们的模型的有效性。

原文题目:Learning Reinforced Dynamic Representations for Sequential Recommendation

原文:Recently, sequential recommendation systems are important in solving the information overload in many online services. Current methods in sequential recommendation focus on learning a fixed number of representations for each user at any time, with a single representation or multi-interest representations for the user. However, when a user is exploring items on an e-commerce recommendation system, the number of this user's interests may change overtime (e.g. increase/reduce one interest), affected by the user's evolving self needs. Moreover, different users may have various number of interests. In this paper, we argue that it is meaningful to explore a personalized dynamic number of user interests, and learn a dynamic group of user interest representations accordingly. We propose a Reinforced sequential model with dynamic number of interest representations for recommendation systems (RDRSR). Specifically, RDRSR is composed of a dynamic interest discriminator (DID) module and a dynamic interest allocator (DIA) module. The DID module explores the number of a user's interests by learning the overall sequential characteristics with bi-directional self-attention and Gumbel-Softmax. The DIA module allocates the historical clicked items into a group of sub-sequences and constructs user's dynamic interest representations. We formalize the allocation problem in the form of Markov Decision Process(MDP), and sample an action from policy pi for each item to determine which sub-sequence it belongs to. Additionally, experiments on the real-world datasets demonstrates our model's effectiveness.