(数据挖掘-入门-1)基于用户的协同过滤之最近邻
入门 基于 用户 过滤 数据挖掘 协同 最近
2023-09-14 08:59:06 时间
主要内容:
1、什么是基于用户的协同过滤
2、python实现
1、什么是基于用户协同过滤:
协同过滤:Collaborative Filtering,一般用于推荐系统,如京东,亚马逊等电商网站上的“购买该物品的用户还喜欢/购买”之类的栏目都是根据协同过滤推荐出来的。
基于用户的协同过滤:User-based CF,通过不同用户对item(物品)的评分来评测用户之间的相似性,基于用户之间的相似性做出推荐。
这里介绍一种最简单的过滤方法:最近邻,即找到与某用户最相似的用户,将该用户喜欢的物品(而某用户并未评分的物品)推荐给某用户。
缺点:
1、用户少,物品多,并不是每个用户都对每个物品进行过评分,因此存在缺失值;
2、如果相似的用户和被推荐的用户评分的物品都相同,会出现无物品推荐的情况;
细节:
衡量相似性:曼哈顿距离,欧几里得距离等(简单,后续介绍其他相似度的计算方法)
当r=1,为曼哈顿距离;当r=2,为欧几里得距离。
2、Python实现
场景:基于用户对一些书籍的评分,来为某些用户推荐书籍;
数据:如下表
实现:
Python(有关python的语法就不介绍了,直接贴出代码)
# # FILTERINGDATA.py # # Code file for the book Programmer's Guide to Data Mining # http://guidetodatamining.com # Ron Zacharski # from math import sqrt users = {"Angelica": {"Blues Traveler": 3.5, "Broken Bells": 2.0, "Norah Jones": 4.5, "Phoenix": 5.0, "Slightly Stoopid": 1.5, "The Strokes": 2.5, "Vampire Weekend": 2.0}, "Bill":{"Blues Traveler": 2.0, "Broken Bells": 3.5, "Deadmau5": 4.0, "Phoenix": 2.0, "Slightly Stoopid": 3.5, "Vampire Weekend": 3.0}, "Chan": {"Blues Traveler": 5.0, "Broken Bells": 1.0, "Deadmau5": 1.0, "Norah Jones": 3.0, "Phoenix": 5, "Slightly Stoopid": 1.0}, "Dan": {"Blues Traveler": 3.0, "Broken Bells": 4.0, "Deadmau5": 4.5, "Phoenix": 3.0, "Slightly Stoopid": 4.5, "The Strokes": 4.0, "Vampire Weekend": 2.0}, "Hailey": {"Broken Bells": 4.0, "Deadmau5": 1.0, "Norah Jones": 4.0, "The Strokes": 4.0, "Vampire Weekend": 1.0}, "Jordyn": {"Broken Bells": 4.5, "Deadmau5": 4.0, "Norah Jones": 5.0, "Phoenix": 5.0, "Slightly Stoopid": 4.5, "The Strokes": 4.0, "Vampire Weekend": 4.0}, "Sam": {"Blues Traveler": 5.0, "Broken Bells": 2.0, "Norah Jones": 3.0, "Phoenix": 5.0, "Slightly Stoopid": 4.0, "The Strokes": 5.0}, "Veronica": {"Blues Traveler": 3.0, "Norah Jones": 5.0, "Phoenix": 4.0, "Slightly Stoopid": 2.5, "The Strokes": 3.0} } def manhattan(rating1, rating2): """Computes the Manhattan distance. Both rating1 and rating2 are dictionaries of the form {'The Strokes': 3.0, 'Slightly Stoopid': 2.5}""" distance = 0 commonRatings = False for key in rating1: if key in rating2: distance += abs(rating1[key] - rating2[key]) commonRatings = True if commonRatings: return distance else: return -1 #Indicates no ratings in common def minskowski(rating1,rating2): distance=0 commonRatings=Flase for key in rating1: for key in rating2: distance+=pow(abs(rating1[key]-rating2[key]),r) commonRatings=True if commonRatings: return pow(distance,1/r) else: return 0 #indicates no ratings in common def computeNearestNeighbor(username, users): """creates a sorted list of users based on their distance to username""" distances = [] for user in users: if user != username: distance = manhattan(users[user], users[username]) #distance = minskowski(users[user], users[username], 2) distances.append((distance, user)) # sort based on distance -- closest first distances.sort() return distances def recommend(username, users): """Give list of recommendations""" # first find nearest neighbor nearest = computeNearestNeighbor(username, users)[0][1] recommendations = [] # now find bands neighbor rated that user didn't neighborRatings = users[nearest] userRatings = users[username] for artist in neighborRatings: if not artist in userRatings: recommendations.append((artist, neighborRatings[artist])) # using the fn sorted for variety - sort is more efficient return sorted(recommendations, key=lambda artistTuple: artistTuple[1], reverse = True) # examples - uncomment to run print( recommend('Hailey', users)) #print( recommend('Chan', users))
3、参考文献:
http://www.guidetodatamining.com/chapter2/
相关文章
- Android 自定义控件开发入门 (三)
- SpringBoot入门之基于Druid配置Mybatis多数据源
- Web开发入门
- RPC学习----Thrift快速入门和Java简单示例
- java struts2入门学习---中文验证、对错误消息的分离、结果(result)类型细节配置
- (数据挖掘-入门-4)基于物品的协同过滤
- (数据挖掘-入门-3)基于用户的协同过滤之k近邻
- Dart入门—基础类型与正则
- (数据挖掘-入门-8)基于朴素贝叶斯的文本分类器
- 计算机视觉 基于CUDA编程的入门与实践 线程及同步四
- 计算机视觉 基于CUDA编程的入门与实践 线程及同步三
- 计算机视觉 基于CUDA编程的入门与实践 线程及同步二
- 〖大前端 - 基础入门三大核心之 html 篇②〗- 前端开发工具和环境准备
- 《ClickHouse企业级应用:入门、进阶与实战》8 基于ClickHouse Bitmap实现DMP用户画像标签圈人
- RabbitMQ入门-高效的Work模式
- 001-ant design安装及快速入门【基于纯antd的基本项目搭建】
- 六天入门软件测试①——测试执行讲义
- 基于矩阵分解的推荐算法,简单入门
- 一线互联网公司必备——最为详细的Docker入门吐血总结
- Linux驱动入门-最简单字符设备驱动(基于pc ubuntu)
- Paddle入门实战系列(三):基于CRNN的文本字符交易验证码识别
- Paddle入门实战系列(一):车牌检测与识别