您现在的位置是：首页 > 其他

当前栏目

[Hadoop]Sqoop 1.4.2中文文档（三）之SqoopJob与其外的操作

中文文档 hadoop 操作 1.4 Sqoop 与其

2023-09-14 08:56:50 时间

一、sqoop job相关命令参数

usage: sqoop job [GENERIC-ARGS] [JOB-ARGS] [-- [ tool-name ] [TOOL-ARGS]]

Job management arguments:

 --create job-id Create a new saved job

 --delete job-id Delete a saved job

 --exec job-id Run a saved job

 --help Print usage instructions

 --list List saved jobs

 --meta-connect jdbc-uri Specify JDBC connect string for the

 metastore

 --show job-id Show the parameters for a saved job

 --verbose Print more information while working

Generic Hadoop command-line arguments:

(must preceed any tool-specific arguments)

Generic options supported are

-conf configuration file specify an application configuration file

-D property=value use value for given property

-fs local|namenode:port specify a namenode

-jt local|jobtracker:port specify a job tracker

-files comma separated list of files specify comma separated files to be copied to the map reduce cluster

-libjars comma separated list of jars specify comma separated jar files to include in the classpath.

-archives comma separated list of archives specify comma separated archives to be unarchived on the compute machines.

简单第一眼望去，终于比导入和导出的参数少了很多，自然内容好说一些。

Job存在的目的，是对频繁只用不变化的导入导出工作做自动化处理，例如创建一个Job每天做增量导入，导入最新的数据，这样的任务就可以使用Job来进行。

举个例子：
创建一个Job

$ sqoop job --create myjob -- import --connect jdbc:mysql://example.com/db \

 --table mytable

查询当前Job列表

$ sqoop job --list

Available jobs:

 myjob

查看某个Job的详情：

 $ sqoop job --show myjob

 Job: myjob

 Tool: import

 Options:

 ----------------------------

 direct.import = false

 codegen.input.delimiters.record = 0

 hdfs.append.dir = false

 db.table = mytable

 ...

执行某个Job：

$ sqoop job --exec myjob

10/08/19 13:08:45 INFO tool.CodeGenTool: Beginning code generation

...

执行时重写相关参数，例如你数据库的用户名和密码改变了：

$ sqoop job --exec myjob -- --username someuser -P

Enter password:

...

拥有--meta-connect私有存储空间的hadoop机器才能进行job操作，否则将会报如下错误：

[work@vm-nba01 ~]$ sqoop job --list

12/10/24 16:38:34 ERROR tool.JobTool: There is no JobStorage implementation available

12/10/24 16:38:34 ERROR tool.JobTool: that can read your specified storage descriptor.

12/10/24 16:38:34 ERROR tool.JobTool: Dont know where to save this job info! You may

12/10/24 16:38:34 ERROR tool.JobTool: need to specify the connect string with --meta-connect.

也就是说机器不知道你要的Job保存在哪里了，所以关于一切Job的操作都是徒劳的。

二、Metastore connection options
上面Joblist出错的问题这里就可以解决了，这里可以教你怎么创建Job的存储空间，相关参数：

Argument Description

--meta-connect jdbc-uri Specifies the JDBC connect string used to connect to the metastore

默认你会在$HOME/.sqoop目录下有一个私有的数据存储，你通过sqoop-metastore命令可以使用热数据存储来建立空间
By default, a private metastore is instantiated in $HOME/.sqoop. If you have configured a hosted metastore with the sqoop-metastore tool, you can connect to it by specifying the --meta-connect argument. This is a JDBC connect string just like the ones used to connect to databases for import.

In conf/sqoop-site.xml, you can configure sqoop.metastore.client.autoconnect.url with this address, so you do not have to supply --meta-connect to use a remote metastore. This parameter can also be modified to move the private metastore to a location on your filesystem other than your home directory.

If you configure sqoop.metastore.client.enable.autoconnect with the value false, then you must explicitly supply --meta-connect.（上述位置待以后实际操作后再翻译）

找到sqoop-site.xml的配置文件，发现如下配置：

 property 

 name sqoop.metastore.client.enable.autoconnect /name 

 value false /value 

 description If true, Sqoop will connect to a local metastore

 for job management when no other metastore arguments are

 provided.

 /description 

 /property

看来这个值设置为了false，导致不能够使用Job相关操作了。

 property 

 name sqoop.metastore.client.record.password /name 

 value true /value 

 description If true, allow saved passwords in the metastore.

 /description 

 /property

当这个值为真的时候才会保存密码。

sqoop-metastore命令可以检查你配置sqoop数据连接的正确性，当然这些配置还是在sqoop-site.xml中。

sqoop-merge命令可以允许你合并2个数据集到1个数据集。merge命令的相关参数：

Argument Description

--class-name class Specify the name of the record-specific class to use during the merge job.

--jar-file file Specify the name of the jar to load the record class from.

--merge-key col Specify the name of a column to use as the merge key.

--new-data path Specify the path of the newer dataset.

--onto path Specify the path of the older dataset.

--target-dir path Specify the target path for the output of the merge job.

举例：

$ sqoop merge --new-data newer --onto older --target-dir merged \

 --jar-file datatypes.jar --class-name Foo --merge-key id

这样就会运行一个MapReduce的任务，新的数据集使用优先度高于老的数据集。
它能被使用在both SequenceFile-, Avro- and text-based incremental imports.并且新老的数据类型时相同的。

sqoop-codegen命令可以还原的你的java类，如果你的源码java丢失了，但是数据没有丢失，可以使用这样的命令：

$ sqoop codegen --connect jdbc:mysql://db.example.com/corp \

 --table employees

sqoop-create-hive-table可以创建一个Hive表，复制某一个数据源的数据存储格式，例如：

$ sqoop create-hive-table --connect jdbc:mysql://db.example.com/corp \

 --table employees --hive-table emps

sqoop-eval命令可以让用户快速做一个操作，结果显示在控制台中。例如：

$ sqoop eval --connect jdbc:mysql://db.example.com/corp \

 --query "SELECT * FROM employees LIMIT 10"

$ sqoop eval --connect jdbc:mysql://db.example.com/corp \

 -e "INSERT INTO foo VALUES(42, bar)"

sqoop-list-databases查看某个数据源的数据库列表，例如：

$ sqoop list-databases --connect jdbc:mysql://database.example.com/

information_schema

employees

sqoop-list-tables查看某个数据源的表列表，例如：

$ sqoop list-tables --connect jdbc:mysql://database.example.com/corp

employees

payroll_checks

job_descriptions

office_supplies

sqoop现在支持的数据库：

Database version --direct support? connect string matches

HSQLDB 1.8.0+ No jdbc:hsqldb:*//

MySQL 5.0+ Yes jdbc:mysql://

Oracle 10.2.0+ No jdbc:oracle:*//

PostgreSQL 8.3+ Yes (import only) jdbc:postgresql://

sqoop数据迁移（基于Hadoop和关系数据库服务器之间传送数据） 1：sqoop的概述：（1）：sqoop是apache旗下一款“Hadoop和关系数据库服务器之间传送数据”的工具。（2）：导入数据：MySQL，Oracle导入数据到Hadoop的HDFS、HIVE、HBASE等数据存储系统；（3）：导出数据：从Hadoop的文件系统中导出数据到关系数据库...
当我们按照hadoop完全分布式集群搭建博客搭建了hadoop以后，发现这是一个空的hadoop，只有YARN，MapReduce，HDFS，而这些实际上我们一般不会直接使用，而是需要另外部署Hadoop的其他组件，来辅助使用。

猜你喜欢

Graphics2D写入图片内容
基于Linux的SPI设备开发实践（linux spi开发）
Oracle触发器：不同类型下的应用（oracle触发器类型）
linux查看网卡实时速率命令_怎么判断网卡速率是否异常
使用MSSQL查看写入日志的方法（mssql 查看写入日志）
JS实现图片循环滚动
Oracle如何优雅退出循环控制结构？（oracle退出循环）
pycharm使用pip安装第三方库_pycharm详细安装教程
中国电信硬刚甲方、失败：中国移动获 228 万大单
深入C++string.find()函数的用法总结
世界战史中的经典战役（三）1941年苏德战争使“喀秋莎”火箭炮一战成名
MetaDaily｜央视联合网易搭建网络春晚元宇宙分会场，米哈游将打造世界级元宇宙产品
北京联合开发Linux认证体系（北京linux认证）
Windows 11重新引入开机音乐更加柔和
限制MongoDB如何修改连接数限制（mongodb修改连接数）

相关主题

cookie存中文
Android 中文API
js 中文文档
中文3d编码
检测中文
mysql中文设置
数字转中文
中文核心
PHP json中文
PHP中文乱码
char中文
中文代码136
中文代码93

zl程序教程

当前栏目

[Hadoop]Sqoop 1.4.2中文文档（三）之SqoopJob与其外的操作

相关文章