ML之CF:基于MovieLens电影评分数据集利用基于用户协同过滤算法(余弦相似度)实现对用户进行Top5电影推荐案例
2023-09-14 09:04:46 时间
ML之CF:基于MovieLens电影评分数据集利用基于用户协同过滤算法(余弦相似度)实现对用户进行Top5电影推荐案例
目录
基于MovieLens电影评分数据集利用基于用户协同过滤算法(余弦相似度)实现对用户进行Top5电影推荐案例
相关文章
ML之CF:基于MovieLens电影评分数据集利用基于用户协同过滤算法(余弦相似度)实现对用户进行Top5电影推荐案例
ML之CF:基于MovieLens电影评分数据集利用基于用户协同过滤算法(余弦相似度)实现对用户进行Top5电影推荐案例实现代码
基于MovieLens电影评分数据集利用基于用户协同过滤算法(余弦相似度)实现对用户进行Top5电影推荐案例
# 1、定义数据集
userId | movieId | rating | timestamp |
1 | 1 | 4 | 964982703 |
1 | 3 | 4 | 964981247 |
1 | 6 | 4 | 964982224 |
1 | 47 | 5 | 964983815 |
1 | 50 | 5 | 964982931 |
1 | 70 | 3 | 964982400 |
1 | 101 | 5 | 964980868 |
1 | 110 | 4 | 964982176 |
1 | 151 | 5 | 964984041 |
1 | 157 | 5 | 964984100 |
movieId | title | genres |
1 | Toy Story (1995) | Adventure|Animation|Children|Comedy|Fantasy |
2 | Jumanji (1995) | Adventure|Children|Fantasy |
3 | Grumpier Old Men (1995) | Comedy|Romance |
4 | Waiting to Exhale (1995) | Comedy|Drama|Romance |
5 | Father of the Bride Part II (1995) | Comedy |
6 | Heat (1995) | Action|Crime|Thriller |
7 | Sabrina (1995) | Comedy|Romance |
8 | Tom and Huck (1995) | Adventure|Children |
9 | Sudden Death (1995) | Action |
10 | GoldenEye (1995) | Action|Adventure|Thriller |
11 | American President, The (1995) | Comedy|Drama|Romance |
userId movieId rating timestamp
0 1 1 4.0 964982703
1 1 3 4.0 964981247
2 1 6 4.0 964982224
3 1 47 5.0 964983815
4 1 50 5.0 964982931
... ... ... ... ...
100831 610 166534 4.0 1493848402
100832 610 168248 5.0 1493850091
100833 610 168250 5.0 1494273047
100834 610 168252 5.0 1493846352
100835 610 170875 3.0 1493846415
[100836 rows x 4 columns]
# 3、模型训练与推理
# 3.1、切分数据集:将数据集分为训练集和测试集
# 3.2、文本数据集再处理
# 构建用户-电影评分矩阵
train_matrix
movieId 1 2 3 4 ... 193583 193585 193587 193609
userId ...
1 4.0 NaN 4.0 NaN ... NaN NaN NaN NaN
2 NaN NaN NaN NaN ... NaN NaN NaN NaN
3 NaN NaN NaN NaN ... NaN NaN NaN NaN
4 NaN NaN NaN NaN ... NaN NaN NaN NaN
5 NaN NaN NaN NaN ... NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ...
606 2.5 NaN NaN NaN ... NaN NaN NaN NaN
607 4.0 NaN NaN NaN ... NaN NaN NaN NaN
608 2.5 2.0 NaN NaN ... NaN NaN NaN NaN
609 3.0 NaN NaN NaN ... NaN NaN NaN NaN
610 NaN NaN NaN NaN ... NaN NaN NaN NaN
[610 rows x 8975 columns]
# 3.3、计算用户之间的相似度:余弦相似度
user_similarity
userId 1 2 3 4 5 6 7 8 9 ... 602 603 604 605 606 607 608 609 610
userId ...
1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1
2 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1
3 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1
4 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1
5 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1
... .. .. .. .. .. .. .. .. .. ... .. .. .. .. .. .. .. .. ..
606 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1
607 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1
608 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1
609 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1
610 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1
[610 rows x 610 columns]
# 3.4、模型评估:计算准确率和召回率
userId movieId rating
618 408 138036 5.0
123 1 2459 5.0
650 409 1234 5.0
162 1 3273 5.0
163 1 3386 5.0
precision: 0.026973684210526316
recall: 0.004065846886156287
# 3.5、模型推理:为用户1推荐电影
userId movieId rating
618 408 138036 5.0
123 1 2459 5.0
650 409 1234 5.0
162 1 3273 5.0
163 1 3386 5.0
precision: 0.026973684210526316
recall: 0.004065846886156287
userId movieId rating
460 405 32587 5.0
715 409 3814 5.0
286 410 3855 5.0
288 410 3910 5.0
487 406 56949 5.0
相关文章
- 算法 时间复杂度概念及案例
- php案例:json新增数据
- 跟镜像隐患 Say goodbye,看这场 130000 个容器,拉取镜像 18000 次的实战案例
- PQ实战案例拆解 | 汇总多股票交易数据,计算最近60天的5日移动平均的操作与算法优化
- Pandas数据处理与分析高级案例详解
- 【Redis高手修炼之路】案例——异步加载所有联系人
- Locust学习笔记3——模拟登录案例(非加密)
- 案例成果展 | 助力长沙银行打造一云多芯的全栈云原生信创平台
- DSP CLA算法开发案例——基于TMS320F2837xD+FPGA开发板
- LAMP实战案例:实现PowerDNS 应用部署
- LVS实战案例:LVS高可用性实现
- Mysql案例分享(casemysql)
- 案例实践丨最优化算法的前世今生