您现在的位置是：首页 > 其他

当前栏目

re模块进行单词统计

统计模块进行单词 re

2023-09-11 14:16:16 时间

from collections import defaultdict
import re

d=defaultdict(lambda :0)

with open(r'e:/bb.txt',mode='rt+',encoding='utf8') as f:
    for line in f:
        for sub in re.split('[^\w-]+',line):
            if len(sub)>0:
                d[sub]+=1
b=0
for p in sorted(d,key=lambda x:d[x],reverse=True):
    if b<10:
        print(p,d[p])
        b+=1
b=0
for p in sorted(d.items(),key=lambda x:x[1],reverse=True):
    if b<10:
        print(p[0],p[1])
        b+=1

import re
from collections import defaultdict

d=defaultdict(lambda :0)
regex=re.compile('[^\w-]+',flags=re.S|re.I)

with open('e:/bb.txt',mode='rt+',encoding='utf8') as f:
    for line in f:
        for sub in regex.split(line):
            if len(sub)>0:
                d[sub.lower()]+=1
print(d)

v=0
for p in sorted(d,key=lambda u:d[u],reverse=True):
    if v<10:
        print(p,d[p])
        v+=1
        
def wordcount(path:str='.'):
    d=defaultdict(lambda :0)
    with open(path,mode='rt+',encoding='utf8') as f:
        for line in f:
            for sub in regex.split(line):
                if len(sub)>0:
                    d[sub.upper()]+=1
    return d

v=0
for p in sorted(wordcount('e:/bb.txt').items(),key=lambda m:m[1],reverse=True):
    if v<10:
        print(p)
        v+=1

猜你喜欢

非阻塞式JavaScript脚本介绍
nginx图片过滤处理模块http_image_filter_module安装配置笔记
kafka学习之-深入研究原理
论文投稿指南——中文核心期刊推荐（能源与动力工程）
Map以及其子类
No module named 'Crypto' 解决方案
C# 格式化string类型的金额
webrtc报错：Unable to load:src/third_party/usrsctp/BUILD.gn
Pytorch网络模型转Onnx格式，多种方法（opencv、onnxruntime、c++）调用Onnx

相关主题

字符统计
Oracle中的统计信息
数字统计
统计查询-sql

zl程序教程

当前栏目

re模块进行单词统计

相关文章