sqoop import mysql to hive table:GC overhead limit exceeded
1. Scenario description
when I use sqoop to import mysql table into hive, I got the following error:
19/07/12 15:41:35 WARN hcat.SqoopHCatUtilities: The Sqoop job can fail if types are not assignment compatible 19/07/12 15:41:35 WARN hcat.SqoopHCatUtilities: The HCatalog field submername has type string. Expected = varchar based on database column type : VARCHAR 19/07/12 15:41:35 WARN hcat.SqoopHCatUtilities: The Sqoop job can fail if types are not assignment compatible 19/07/12 15:41:35 INFO mapreduce.DataDrivenImportJob: Configuring mapper for HCatalog import job 19/07/12 15:41:35 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 19/07/12 15:41:35 INFO client.RMProxy: Connecting to ResourceManager at hadoop-namenode01/192.168.1.101:8032 19/07/12 15:41:35 WARN conf.HiveConf: HiveConf of name hive.server2.webui.host.port does not exist 19/07/12 15:41:35 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum 19/07/12 15:42:52 INFO db.DBInputFormat: Using read commited transaction isolation 19/07/12 15:42:52 INFO mapreduce.JobSubmitter: number of splits:1 19/07/12 15:42:53 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1562229385371_50086 19/07/12 15:42:53 INFO impl.YarnClientImpl: Submitted application application_1562229385371_50086 19/07/12 15:42:53 INFO mapreduce.Job: The url to track the job: http://hadoop-namenode01:8088/proxy/application_1562229385371_50086/ 19/07/12 15:42:53 INFO mapreduce.Job: Running job: job_1562229385371_50086 19/07/12 15:43:38 INFO hive.metastore: Closed a connection to metastore, current connections: 1 19/07/12 15:43:48 INFO mapreduce.Job: Job job_1562229385371_50086 running in uber mode : false 19/07/12 15:43:48 INFO mapreduce.Job: map 0% reduce 0% 19/07/12 15:48:31 INFO mapreduce.Job: Task Id : attempt_1562229385371_50086_m_000000_0, Status : FAILED Error: GC overhead limit exceeded
Why Sqoop Import throws this exception?
The answer is – During the process, RDBMS database (NOT SQOOP) fetches all the rows at one shot and tries to load everything into memory. This causes memory spill out and throws error. To overcome this you need to tell RDBMS database to return the data in batches. The following parameters “?dontTrackOpenResources=true&defaultFetchSize=10000&useCursorFetch=true” following the jdbc connection string tells database to fetch 10000 rows per batch.
The script I use to import is as follows:
file sqoop_order_detail.sh
#!/bin/bash /home/lenmom/sqoop-1.4.7/bin/sqoop import \ --connect jdbc:mysql://lenmom-mysql:3306/inventory \ --username root --password root \ --driver com.mysql.jdbc.Driver \ --table order_detail \ --hcatalog-database orc \ --hcatalog-table order_detail \ --hcatalog-partition-keys pt_log_d \ --hcatalog-partition-values 20190709 \ --hcatalog-storage-stanza 'stored as orc tblproperties ("orc.compress"="SNAPPY")' \ -m 1
the target mysql table has 10 billion record.
2.Solution:
2.1 solution 1
modify the mysql url to set stream read data style by append the following content:
?dontTrackOpenResources=true&defaultFetchSize=10000&useCursorFetch=true
of which the defaultFetchSize can be changed according to specific condition,in my case, the whole script is :
#!/bin/bash /home/lenmom/sqoop-1.4.7/bin/sqoop import \ --connect jdbc:mysql://lenmom-mysql:3306/inventory?dontTrackOpenResources=true\&defaultFetchSize=10000\&useCursorFetch=true\&useUnicode=yes\&characterEncoding=utf8\&characterEncoding=utf8 \ --username root --password root \ --driver com.mysql.jdbc.Driver \ --table order_detail \ --hcatalog-database orc \ --hcatalog-table order_detail \ --hcatalog-partition-keys pt_log_d \ --hcatalog-partition-values 20190709 \ --hcatalog-storage-stanza 'stored as orc tblproperties ("orc.compress"="SNAPPY")' \ -m 1
don't forget to use escape for & in shell script, or we can also use "jdbc url" to instead of using escape.
#!/bin/bash /home/lenmom/sqoop-1.4.7/bin/sqoop import \ --connect "jdbc:mysql://lenmom-mysql:3306/inventory?dontTrackOpenResources=true&defaultFetchSize=10000&useCursorFetch=true&useUnicode=yes&characterEncoding=utf8&characterEncoding=utf8" \ --username root --password root \ --driver com.mysql.jdbc.Driver \ --table order_detail \ --hcatalog-database orc \ --hcatalog-table order_detail \ --hcatalog-partition-keys pt_log_d \ --hcatalog-partition-values 20190709 \ --hcatalog-storage-stanza 'stored as orc tblproperties ("orc.compress"="SNAPPY")' \ -m 1
2.2 Solution 2
sqoop import -Dmapreduce.map.memory.mb=6000 -Dmapreduce.map.java.opts=-Xmx1600m -Dmapreduce.task.io.sort.mb=4800
Above parameters needs to be tuned according to the data for a successful SQOOP pull.
2.3 Solution 3
increase mapper number(the default mapper number is 4, should not greater than datanode number)
sqoop job --exec lenmom-job -- --num-mappers 8;
reference:
相关文章
- Neor Profile SQL is the solution for developers wanting to profile the MySQL queries. This software allows the tracking of all queries to MySQL server from your application or website.
- Sqoop2 将hdfs中的数据导出到MySQL
- 用navicat连接数据库报错:1130-host ... is not allowed to connect to this MySql server如何处理
- uthentication to host '10.0.1.33' for user 'root' using method 'mysql_native_password' failed with message: Access denied for user 'root'@'PCName' (using password: YES)
- mysql远程连接 Host * is not allowed to connect to this MySQL server
- MySQL 集群服务简介
- 理解和解决 MySQL 乱码问题
- MySQL: Prepared Statements
- MySQL 窗口函数 & MySQL8新特性
- Mysql数据库常用指令
- is not allowed to connect to this mysql server
- 项目访问mysql时报: Failed to obtain JDBC Connection...:Host ‘X‘is not allowed to connect to this MySQL ser
- Navicat for MySQL 使用
- 《MySQL高效编程》一一1.1 数据库简介
- 基于Java+MySQL实现(Web)师生健康码管理系统【100010204】
- Windows Mysql5.7安装和配置,Windows 安装多个Mysql
- mac ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)
- 首次启动mysql服务,设置root账户密码,mysql5.1免安装版
- springboot-集成MySQL+druid+mybatis-plus总结
- springboot jpa mongodb 整合mysql Field in required a bean of type that could not be found Failed to load ApplicationContext
- 有赞MySQL自动化运维之路—ZanDB
- 如何解决mysql数据库8小时无连接自动关闭
- ERROR 2002 (HY000): Can't connect to local server through socket '/run/mysql 或 manjaro 安装 mariadb
- MySQL workbench提示“Your connection attempt failed ...“解决方法
- Mysql中出现please use mysql_upgrade to fix this errot解决的办法
- How to install Mysql in the CentOS
- 高性能 MySQL(五):设计表结构时,如何选择数据类型会更高效?