您现在的位置是：首页 > 其他

当前栏目

word2vec中文词向量结合PCA算法在二维空间下可视化分析-代码

中文算法代码分析空间结合可视化二维

2023-06-13 09:16:13 时间

记录下

%matplotlib inline


from jupyterthemes import jtplot
jtplot.style(theme='grade3') #选择一个绘图主题

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.decomposition import PCA
import adjustText

from gensim.models.keyedvectors import KeyedVectors

word_vectors = KeyedVectors.load_word2vec_format(\
    'C:/Users/yue/Desktop/1.bin', \
    binary = False, limit = 1000000)

def plot_2d_representation_of_words(
    word_list, 
    word_vectors, 
    flip_x_axis = False,
    flip_y_axis = False,
    label_x_axis = "x",
    label_y_axis = "y", 
    label_label = "fruit"):

    pca = PCA(n_components = 2)

    word_plus_coordinates=[]

    for word in word_list: 
        current_row = []
        current_row.append(word)
        current_row.extend(word_vectors[word])
        word_plus_coordinates.append(current_row)

    word_plus_coordinates = pd.DataFrame(word_plus_coordinates)

    coordinates_2d = pca.fit_transform(
        word_plus_coordinates.iloc[:,1:300])
    coordinates_2d = pd.DataFrame(
        coordinates_2d, columns=[label_x_axis, label_y_axis])
    coordinates_2d[label_label] = word_plus_coordinates.iloc[:,0]
    if flip_x_axis:
        coordinates_2d[label_x_axis] = \
        coordinates_2d[label_x_axis] * (-1)
    if flip_y_axis:
        coordinates_2d[label_y_axis] = \
        coordinates_2d[label_y_axis] * (-1)

    plt.figure(figsize = (5, 3))
    p1=sns.scatterplot(
        data=coordinates_2d, x=label_x_axis, y=label_y_axis)

    x = coordinates_2d[label_x_axis]
    y = coordinates_2d[label_y_axis]
    label = coordinates_2d[label_label]

    texts = [plt.text(x[i], y[i], label[i]) for i in range(len(x))]
    adjustText.adjust_text(texts)

from pylab import mpl
mpl.rcParams['font.sans-serif'] = ['STZhongsong']    # 指定默认字体：解决plot不能显示中文问题
mpl.rcParams['axes.unicode_minus'] = False           # 解决保存图像是负号'-'显示为方块的问题

#fruits = ['apple','orange','banana','lemon','car','tram','boat','bicycle','cherry','mango','grape','durian','watermelon','train','motorbike','ship',  'peach','pear','pomegranate','strawberry','bike','bus','truck','subway','airplane']
fruits = ['苹果', '自行车', '香蕉', '汽车', '人']      

plot_2d_representation_of_words(
    word_list = fruits, 
    word_vectors = word_vectors, 
    flip_y_axis = True)

这是在jupyter notebook运行的，使用的是腾讯AI Lab的中文词向量，下载压缩包下来解压，最里面的txt改成bin文件

可以看到寓意之间的关系

猜你喜欢

实用的简单PHP分页集合包括使用方法
第十四届蓝桥杯集训——练习解题阶段(无序阶段)-ALGO-988 逗志芃的危机
ORA-24367: user handle has not been set in service handle ORACLE 报错故障修复远程处理
掌握Oracle触发器的六种类型（oracle触发器类型）
OmniPlan Pro 4 for Mac(最好用的项目流程管理工具)v4.5.1中文激活版
Windows 系统打印机相关操作命令
ChatGPT真猛！直接写了一本量化交易的书（附下载）
如何在Linux中使用命令分屏显示（linux命令分屏显示）
深度剖析 Linux 读行技巧，助你掌握高效文本处理！（linux读行）
Oracle数据库如何写复杂SQL语句？（oracle复杂sql）
补码运算溢出判断方法是_一个8位二进制整数采用补码表示
文件Linux 下利用 Wine 运行 Windows EXE 文件（linux打开exe）
数据库开启MySQL数据库：一步一步迈向成功（开启mysql）
如何优化Oracle的内存资源利用（oracle 内存资源）
5个月两次坠机，波音737Max8有什么问题？
linux ssh代理 –快速搭建代理服务器
java lisi乱序、排序详解编程语言

zl程序教程

当前栏目

word2vec中文词向量结合PCA算法在二维空间下可视化分析-代码

相关文章