zl程序教程

您现在的位置是:首页 >  工具

当前栏目

【Hive】Apache Hive系列之Hive中表的操作

Apache 操作 系列 hive 中表
2023-09-11 14:22:06 时间


Hive支持 原始数据类型复杂类型
原始类型包括 数值型Boolean字符串时间戳等。
复杂类型包括 数组Map集合struct等。
下面是Hive数据类型的一个总结:

原始类型

类型描述字面量示例
BOOLEAN布尔值,可选值true/falsetrue
TINYINT1字节的有符号整数 -128~1271Y
SMALLINT2个字节的有符号整数,-32768~327671S
INT4个字节的带符号整数1
BIGINT8字节带符号整数1L
FLOAT4字节单精度浮点数1.0
DOUBLE8字节双精度浮点数1.0
DEICIMAL任意精度的带符号小数1.0
STRING字符串,变长“Abc”
VARCHAR变长字符串“Aac”
CHAR固定长度字符串“a”,’b’
BINARY字节数组
TIMESTAMP时间戳,纳秒精度122327493795
DATE日期‘2020-01-01

复杂类型

类型描述字面量示例
ARRAY有序的的同类型的集合array(1,2)
MAPkey-value,key必须为原始类型,value可以任意类型map(‘a’,1,’b’,2)
STRUCT字段集合,类型可以不同struct(‘1’,1,1.0), named_stract(‘col1’,’1’,’col2’,1,’clo3’,1.0)

案例测试

①创建库

0: jdbc:hive2://CentOS:10000> create database test;
No rows affected (0.324 seconds)
0: jdbc:hive2://CentOS:10000> desc database test;

②使用库

0: jdbc:hive2://CentOS:10000> use test;
No rows affected (0.092 seconds)

# 查看当前使用的是哪个数据库
0: jdbc:hive2://CentOS:10000> select current_database();
+---------+--+
|   _c0   |
+---------+--+
| test    |
+---------+--+
1 row selected (0.877 seconds)

③创建表

create table if not exists employee (
  id int,
  name varchar(32),
  age int,
  salary double,
  birthDay date,
  hobbies array<string>,
  address struct<street:string,country:string>,
  detail map<string,double>
);

④查看所有表

0: jdbc:hive2://CentOS:10000> show tables;
+-------------+--+
|  tab_name   |
+-------------+--+
|  employee   |
+-------------+--+
1 row selected (0.105 seconds)

⑤查看表的建表详情

0: jdbc:hive2://CentOS:10000> desc employee;
+-----------+---------------------------------------+----------+--+
| col_name  |               data_type               | comment  |
+-----------+---------------------------------------+----------+--+
| id        | int                                   |          |
| name      | varchar(32)                           |          |
| age       | int                                   |          |
| salary    | double                                |          |
| birthday  | date                                  |          |
| hobbies   | array<string>                         |          |
| address   | struct<street:string,country:string>  |          |
| detail    | map<string,double>                    |          |
+-----------+---------------------------------------+----------+--+
8 rows selected (0.318 seconds)

如果用户需要看到更多建表信息可以使用desc formatted 表名查看

数据格式

默认分隔符

在创建完表之后,需要准备数据,其中hive默认的字段分割符号如下,因此上述建表等价写法如下:

分隔符描述
\n对于文本文件而言,一行表示一条文本记录,因此可以使用\n表示
^A(Ctrl+A)用于分割字符的列,在创建表的时候可以使用八进制’\001’表示
^B(Ctrl+B)用于分割ARRAY、STRUCT或者MAP中的元素,在创建表的时候可以使用八进制’\002’表示
^C(Ctrl+C)用于分割MAP中的key,value,在创建表的时候可以使用八进制’\003’表示
create table if not exists employee_1 (
  id int,
  name varchar(32),
  age int,
  salary double,
  birthDay date,
  hobbies array<string>,
  address struct<street:string,country:string>,
  detail map<string,double>
)
row format delimited
fields terminated by '\001'
collection items terminated by '\002'
map keys terminated by '\003'
lines terminated by '\n'
stored as textfile;

按照如下数据格式输入数据,然后倒入到指定的表中:

1^Azhangsan^A18^A5000^A2020-10-10^ATV^BPlayGame^Bbasketball^Abeijing^BChina^A001^Cjianshe^B002^Cjiaotong

加载以上数据至hive
load data local inpath '本地路径' overwrite into table 表名

  • local-表示本地读取省略则表示从hdfs读取
  • overwrite - 表示覆盖表目录下的数据,如果去掉则保留原有的数据文件
0: jdbc:hive2://CentOS:10000> load data local inpath '/root/employee' overwrite into table employee_1;
INFO  : Loading data to table baizhi.employee_1 from file:/root/employee
INFO  : Table test.employee_1 stats: [numFiles=1, numRows=0, totalSize=175, rawDataSize=0]
No rows affected (0.622 seconds)

自定义分割符号

1,zhangsan,true,18,15000,TV|Game,001>建设|002>招商,china|bj 
2,lisi,true,28,15000,TV|Game,001>建设|002>招商,china|bj 
3,wangwu,false,38,5000,TV|Game,001>建设|002>招商,china|sh
create table if not exists user (
  id int,
  name varchar(32),
  sex boolean,
  age int,
  salary double,
  hobbies array<string>,
  card map<string,string>,
  address struct<street:string,country:string>
)
row format delimited
fields terminated by ','
collection items terminated by '|'
map keys terminated by '>'
lines terminated by '\n'
stored as textfile;

正则格式数据

正则格式检测网址:regex101.com

192.168.0.1 qq com.xx.xx.XxxService#xx 2018-10-10 10:10:00
192.168.2.1 qq com.xx.xx.XxxService#xx 2018-10-10 10:10:00
192.168.0.1 xx com.xx.xx.XxxService#xx 2018-10-10 10:10:00
192.168.202.1 qq com.xx.xx.XxxService#xx 2018-10-10 10:10:00
create table if not exists access(
     ip string,
     app varchar(32),
     service string,
     last_time timestamp
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
  "input.regex"="^(.*)\\s(.*)\\s(.*)\\s(.*\\s.*)"
);

CSV格式文件

1,apple,3,20.5
2,orange,2,21.5
3,pear,2,22.5
4,banana,1,23.0
CREATE TABLE if not exists product(
  id int,
  item string,
  count int,
  price double
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
   "separatorChar" = ",",
   "escapeChar"    = "\\"
);

JSON格式文件

{"id":1,"name":"zhangsan","sex":true,"register":"1991-02-08","score":100.0}
{"id":2,"name":"lisi","sex":true,"register":"1991-02-08","score":80.0}
create table if not exists student1(
   id int,
   name varchar(32),
   sex boolean,
   register date,
   score double
)ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe';

需要添加jar文件 add jar /usr/apache-hive-1.2.2-bin/hcatalog/share/hcatalog/hive-hcatalog-core-1.2.2.jar,删除的时候使用delete指令.