zl程序教程

您现在的位置是:首页 >  工具

当前栏目

利用Flume采集IIS日志到HDFS

日志HDFS 利用 采集 iis flume
2023-09-11 14:22:41 时间

1.下载flume 1.7

到官网上下载 flume 1.7版本

2.配置flume配置文件

刚开始的想法是从IIS--->Flume-->Hdfs

但在采集的时候一直报错,无法直接连接到远程的hdfs

22 二月 2017 14:59:04,566 WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSEventSink.process:443)  - HDFS IO error
java.io.IOException: Callable timed out after 10000 ms on file: hdfs://192.168.1.75:9008/iis/2017-02-22/u_ex151127.log.1487746609021.tmp
    at org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:682)
    at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:232)
    at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:504)
    at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:406)
    at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
    at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.FutureTask.get(FutureTask.java:205)
    at org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:675)
    ... 6 more

所以后面有选用折中的办法,从 windows flume 采集到linux的flume,再到hdfs

IIS-->(Windows)Flume-->(Linux)Flume-->Hdfs

采集端windows flume配置文件如下:

a1.sources = r1
a1.sinks = k1
a1.channels = c1
 
# Describe/configure the source
a1.sources.r1.type = spooldir
a1.sources.r1.channels = c1
a1.sources.r1.spoolDir = C:\\inetpub\\logs\\LogFiles\\W3SVC4
a1.sources.r1.fileHeader = true
a1.sources.r1.basenameHeader = true
a1.sources.r1.basenameHeaderKey = fileName
a1.sources.r1.ignorePattern = ^(.)*\\.tmp$
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = timestamp

a1.sinks.k1.type = avro
a1.sinks.k1.hostname = 192.168.1.75
a1.sinks.k1.port = 44444
 
# Use a channel which buffers events in memory
a1.channels.c1.type=memory  
a1.channels.c1.capacity=10000  
a1.channels.c1.transactionCapacity=1000  
a1.channels.c1.keep-alive=30  

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

其中主要就是将sinks配置到linux中的flume地址,采集目录就是IIS的某个网站日志文件地址:C:\\inetpub\\logs\\LogFiles\\W3SVC4

接收端linux flume的配置如下:

tier1.sources=source1
tier1.channels=channel1  
tier1.sinks=sink1  
      
tier1.sources.source1.type=avro  
tier1.sources.source1.bind=192.168.1.75  
tier1.sources.source1.port=44444  
tier1.sources.source1.channels=channel1  
      
tier1.channels.channel1.type=memory  
tier1.channels.channel1.capacity=10000  
tier1.channels.channel1.transactionCapacity=1000  
tier1.channels.channel1.keep-alive=30  
      
tier1.sinks.sink1.channel=channel1  

tier1.sinks.sink1.type = hdfs
tier1.sinks.sink1.hdfs.path = hdfs://127.0.0.1:9008/iis
tier1.sinks.sink1.hdfs.writeFormat = Text
tier1.sinks.sink1.hdfs.fileType = DataStream
tier1.sinks.sink1.hdfs.rollInterval = 0
tier1.sinks.sink1.hdfs.rollSize = 0
tier1.sinks.sink1.hdfs.rollCount = 0
tier1.sinks.sink1.hdfs.filePrefix = localhost-%Y-%m-%d
tier1.sinks.sink1.hdfs.useLocalTimeStamp = true
tier1.sinks.sink1.hdfs.idleTimeout = 60
 

3.启动linux中的flume 

./flume-ng agent -c ../conf -f ../conf/avro_hdfs.conf -n tier1 -Dflume.root.logger=DEBUG,console

4.启动windows中的flume

需要在flume的bin目录中启动

flume-ng.cmd agent --conf ..\conf --conf-file ..\conf\avro.conf --name a1