zl程序教程

您现在的位置是:首页 >  其他

当前栏目

Oozie-3.3.2安装配置运行实践

安装配置 实践 运行 3.3 oozie
2023-09-14 08:57:29 时间

Oozie是一个开源的工作流调度系统,它能够管理逻辑复杂的多个Hadoop作业,按照指定的顺序将其协同运行起来。例如,我们可能有这样一个需求,某个业务系统每天产生20G原始数据,我们每天都要对其进行处理,处理步骤如下所示:

通过Hadoop先将原始数据同步到HDFS上; 借助MapReduce计算框架对原始数据进行转换,生成的数据以分区表的形式存储到多张Hive表中; 需要对Hive中多个表的数据进行JOIN处理,得到一个明细数据Hive大表; 将明细数据进行复杂的统计分析,得到排序后的报表信息; 需要将统计分析得到的结果数据同步到业务系统中,供业务调用使用。

上述过程可以通过工作流系统来编排任务,最终生成一个工作流实例,然后每天定时启动运行这个实例即可。在这种依赖于Hadoop存储和处理能力要求的应用场景下,Oozie可能能够简化任务调度和执行。
这里,我们在CentOS 6.2系统下安装Oozie-3.3.2,需要安装相关的依赖软件包,下面我们一步一步地进行安装,包括安装配置依赖软件包。这里,我们使用MySQL数据库存储Oozie数据,Hadoop使用的是1.2.1版本。

安装Oozie Server

Oozie Server可以为我们提供很多管理Job的便捷功能,比如,通过可视化界面去管理Job的运行状态,同时也支持我构建含有多个复杂Hadoop Job流程,各个Job之间的依赖关系完全可以通过一个工作流配置文件组装起来,然后由Oozie Server其管理执行。

安装Maven构建工具

下载安装,执行如下命令:

wget http://mirrors.hust.edu.cn/apache/maven/maven-3/3.2.1/binaries/apache-maven-3.2.1-bin.tar.gz

如果使用MySQL存储Oozie数据,需要将MySQL的驱动程序拷贝到Tomcat安装目录下,亦即$CATALINA_HOME/lib下面。

准备ExtJS工具包

下载ExtJS压缩包:

wget http://extjs.com/deploy/ext-2.2.zip 安装Oozie

下载安装,执行如下命令:

wget http://mirror.bit.edu.cn/apache/oozie/3.3.2/oozie-3.3.2.tar.gz

构建成后,可以在oozie-3.3.2/distro/target目录下看到构建后的文件,例如我的路径是/home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2,内容如下所示:

[shirdrn@oozie-server oozie-3.3.2]$ pwd
export OOZIE_HOME=/home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2

在上面的目录下创建libext目录,并将hadoop相关的jar库文件拷贝到libext下面,我使用的是Hadoop 1.2.1版本:

[shirdrn@oozie-server oozie-3.3.2]$ mkdir libext
[shirdrn@oozie-server oozie-3.3.2]$ cp ~/cloud/programs/hadoop-1.2.1/hadoop-*.jar libext/
[shirdrn@oozie-server oozie-3.3.2]$ cp ~/cloud/programs/hadoop-1.2.1/lib/*.jar ./libext/

同时,我们使用了MySQL来存储Oozie的元数据,现在需要将MySQL的驱动程序添加到libext目录下:

cp ~/packages/mysql-connector-java-5.1.29/mysql-connector-java-5.1.29/mysql-connector-java-5.1.29-bin.jar libext/
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/asm-3.2.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/aspectjrt-1.6.11.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/aspectjtools-1.6.11.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/commons-beanutils-1.7.0.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/commons-beanutils-core-1.8.0.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/commons-cli-1.2.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/commons-codec-1.4.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/commons-collections-3.2.1.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/commons-configuration-1.6.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/commons-daemon-1.0.1.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/commons-digester-1.8.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/commons-el-1.0.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/commons-httpclient-3.0.1.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/commons-io-2.1.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/commons-lang-2.4.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/commons-logging-1.1.1.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/commons-logging-api-1.0.4.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/commons-math-2.1.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/commons-net-3.1.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/core-3.1.1.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/hadoop-ant-1.2.1.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/hadoop-capacity-scheduler-1.2.1.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/hadoop-client-1.2.1.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/hadoop-core-1.2.1.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/hadoop-examples-1.2.1.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/hadoop-fairscheduler-1.2.1.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/hadoop-minicluster-1.2.1.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/hadoop-test-1.2.1.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/hadoop-thriftfs-1.2.1.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/hadoop-tools-1.2.1.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/hsqldb-1.8.0.10.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/jackson-core-asl-1.8.8.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/jackson-mapper-asl-1.8.8.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/jasper-compiler-5.5.12.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/jasper-runtime-5.5.12.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/jdeb-0.8.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/jersey-core-1.8.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/jersey-json-1.8.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/jersey-server-1.8.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/jets3t-0.6.1.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/jetty-6.1.26.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/jetty-util-6.1.26.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/jsch-0.1.42.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/junit-4.5.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/kfs-0.2.2.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/log4j-1.2.15.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/mockito-all-1.8.5.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/mysql-connector-java-5.1.29-bin.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/oro-2.0.8.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/servlet-api-2.5-20081211.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/slf4j-api-1.4.3.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/slf4j-log4j12-1.4.3.jar
INFO: Adding extension: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libext/xmlenc-0.52.jar
New Oozie WAR file with added ExtJS library, JARs at /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/oozie-server/webapps/oozie.war

这样,上述已经生成了/home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/oozie-server/webapps/oozie.war文件。

配置Oozie

修改conf/oozie-site.xml配置文件,内容如下所示:

property

默认情况下,Oozie的配置中有个配置项oozie.service.JPAService.create.db.schema,值为false,设置非自动创建数据库,我们保持默认设置,这样可以通过手动创建Oozie数据库,并对其进行权限控制。然后,我们在MySQL数据库中创建数据库,名称为oozie,并进行访问授权:

CREATE DATABASE oozie;

查看控制台输出日志,没有报错,并且在当前目录下可以看到,同时也生成了oozie.sql脚本文件。到MySQL数据库中可以看到生成的表,说明上述操作执行成功。
下面可以启动Oozie,使用如下命令:

bin/oozied.sh start

启动信息,示例如下所示:

Setting OOZIE_HOME: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2
Setting OOZIE_CONFIG: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/conf
Sourcing: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/conf/oozie-env.sh
Setting OOZIE_DATA: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/data
Setting OOZIE_LOG: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/logs
Setting CATALINA_BASE: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/oozie-server
Setting CATALINA_OUT: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/logs/catalina.out
Setting CATALINA_PID: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/oozie-server/temp/oozie.pid
Using CATALINA_OPTS: -Xmx1024m -Dderby.stream.error.file=/home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/logs/derby.log
Adding to CATALINA_OPTS: -Doozie.home.dir=/home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2 -Doozie.config.dir=/home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/conf -Doozie.log.dir=/home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/logs -Doozie.data.dir=/home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/data -Doozie.config.file=oozie-site.xml -Doozie.log4j.file=oozie-log4j.properties -Doozie.log4j.reload=10 -Doozie.http.hostname=m1 -Doozie.admin.port=11001 -Doozie.http.port=11000 -Doozie.https.port=11443 -Doozie.base.url=http://m1:11000/oozie -Doozie.https.keystore.file=/home/shirdrn/.keystore -Doozie.https.keystore.pass=password -Djava.library.path=
Using CATALINA_BASE: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/oozie-server
Using CATALINA_TMPDIR: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/oozie-server/temp
Using CLASSPATH: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/oozie-server/bin/tomcat-juli.jar:/home/shirdrn/cloud/programs/apache-tomcat-7.0.52/bin/bootstrap.jar
Using CATALINA_PID: /home/shirdrn/cloud/programs/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/oozie-server/temp/oozie.pid

从上面日志可以看到,Oozie管理控制台连接为http://oozie-server:11000/oozie,可以看到图形化界面。

整合Oozie和Hadoop

我们的Hadoop平台使用的是用户shirdrn,用户组为shirdrn,这里配置Hadoop代理用户也使用该用户,部署Oozie的主机名为oozie-server。修改Hadoop的配置文件core-site.xml,增加如下配置内容:

!-- OOZIE --

安装Oozie Client

我们可以通过在外部的一个Oozie客户端去提交工作流任务,实际上就是一个客户端程序,通过与Oozie Server进行交互,提交任务,并由Oozie Server去调用执行。
我们可以回到前面解压缩Oozie发行包oozie-3.3.2.tar.gz的目录下,通过前面的构建,现在已经可以看到有一个client目录,该目录下就是Oozie的客户端相关文件。含有Oozie客户端脚本的路径,我这里为/home/shirdrn/cloud/programs/oozie-3.3.2/client/target/oozie-client-3.3.2-client/oozie-client-3.3.2。
查看Oozie客户端运行job的命令帮助信息,可以执行如下命令:

cd /home/shirdrn/cloud/programs/oozie-3.3.2/client/target/oozie-client-3.3.2-client/oozie-client-3.3.2

我们可以找到,Oozie发行包中自带的examples,我这里对应的目录是/home/shirdrn/cloud/programs/oozie-3.3.2/examples/target/oozie-examples-3.3.2-examples/examples/apps,我们可以通过运行这些例子来验证安装是否成功。
首先,将Oozie自带的examples上传到HDFS上:

bin/hadoop fs -mkdir /oozie
bin/hadoop fs -copyFromLocal /home/shirdrn/cloud/programs/oozie-3.3.2/examples/target/oozie-examples-3.3.2-examples/examples /user/shirdrn/examples

我们拿examples中的map-reduce来进行验证,修改job.properties文件,配置内容如下所示:

nameNode=hdfs://m1:9000
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/map-reduce

我的环境下,Namenode服务端口为hdfs://m1:9000,JobTracker为m1:19830,运行任务,执行如下命令:

cd /home/shirdrn/cloud/programs/oozie-3.3.2/client/target/oozie-client-3.3.2-client/oozie-client-3.3.2
bin/oozie job -oozie http://oozie-server:11000/oozie -config /home/shirdrn/cloud/programs/oozie-3.3.2/examples/target/oozie-examples-3.3.2-examples/examples/apps/map-reduce/job.properties -run

可以通过OozieWeb管理控制台查看提交运行的任务,如图所示:

dcaf7f10aa1edbeddd04e48f47f96c14257b4a8c以及,job配置,运行状态等信息,如图所示:622db02c6d62a4c7463521e0e05e7ceb57421685上面命令选项-run表示直接运行一个job,当然你可以使用其他选项,如-submit是提交job,-rerun是重新运行job,-suspend是挂起job等等,可以查看命令帮助,或参考相关文档。