您现在的位置是：首页 > 后端

当前栏目

使用python操作hdfs，并grep想要的数据

Python 数据 HDFS 操作 grep 想要使用

2023-09-14 09:11:51 时间

代码如下：

import subprocess


for day in range(24, 30):
    for h in range(0, 24):
        filename = "tls-metadata-2018-10-%02d-%02d.txt" % (day, h)
        cmd = "hdfs dfs -text /data/2018/10/%02d/%02d/*.snappy" % (day, h)
        print(cmd)
        #cmd = "cat *.py"
        cmd = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)
        f = open(filename, "w")
        for line in cmd.stdout:
            try:
              arr = line.split("^")
              if len(arr) >= 120 and arr[6] == "6" and arr[25] == "SSL" and arr[107]:
                #print(line)
                f.write("^".join(arr[:32]) + "^" + arr[95] + "^" + "^".join(arr[105:119])+ "\n")
            except Exception as e:
                print(e, "fuck error", line)
        f.close()
        #import sys
        #sys.exit(0)

猜你喜欢

nginx重新整理——————http 模块中的请求过程[十一]
class 添加样式,删，开关【选择】addClass,removeClass,toggleClass
Http协议支持的8种请求方法
【云栖大会】开源大数据技术的魅力
(转载)Ant教程
phonegap环境配置与基本操作
基于Elman神经网络预测计费系统的输出（Matlab代码实现）
Kubernetes 应用包管理器 Helm 对应用进行管理
asp.net core webapi Session 内存缓存
all controls within the same view will share the same data Model
用另一个表数据更新字段
iOS开发那些事-表视图UI设计模式
Tomcat详解（三）——tomcat多实例
C++卷积神经网络实例：tiny_cnn代码具体解释（6）——average_pooling_layer层结构类分析
通过 FastAdmin 理解开源软件
设计模式--组合模式

相关主题

Python数据清洗
Python概述
h5 Python_python做h5网站
元数据元数据
python_22_enumerate

zl程序教程

当前栏目

使用python操作hdfs，并grep想要的数据

相关文章