Hive中的lateralview、explode、reflect、⾏转列、列转⾏、窗
⼝函。。。
⽬录
⼀、使⽤explode函数将hive表中的Map和Array字段数据进⾏拆分
lateral view⽤于和split、explode等UDTF⼀起使⽤的,能将⼀⾏数据拆分成多⾏数据,在此基础上可以对拆分的数据进⾏聚合,lateral view ⾸先为原始表的每⾏调⽤UDTF,UDTF会把⼀⾏拆分成⼀⾏或者多⾏,lateral view再把结果组合,产⽣⼀个⽀持别名表的虚拟表。
其中explode还可以⽤于将hive⼀列中复杂的array或者map结构拆分成多⾏
需求:现在有数据格式如下
zhangsan      child1,child2,child3,child4    k1:v1,k2:v2
lisi  child5,child6,child7,child8    k3:v3,k4:v4
字段之间使⽤\t分割,需求将所有的child进⾏拆开成为⼀列
+----------+--+
| mychild  |
+----------+--+
| child1    |
| child2    |
| child3    |
| child4    |
| child5    |
| child6    |
| child7    |
| child8    |
+----------+--+
将map的key和value也进⾏拆开,成为如下结果
+-------------------+---------------+
| mymapkey          | mymapvalue    |
+----------------+------------------+
| k1                | v1            |
| k2                | v2            |
| k3                | v3            |
| k4                | v4            |
+-------------------+---------------+
第⼀步:创建hive数据库
创建hive数据库
hive (default)> create database hive_explode;
hive (default)> use hive_explode;
第⼆步:创建hive表,然后使⽤explode拆分map和array
hive (hive_explode)> create  table t3(name string,children array<string>,address Map<string,string>)
row format delimited fields terminated by '\t'
collection items terminated by ','
map keys terminated by ':' stored as textFile;
第三步:加载数据
node03执⾏以下命令创建表数据⽂件
mkdir -p /export/servers/hivedatas/
cd /export/servers/hivedatas/
vim maparray
zhangsan  child1,child2,child3,child4 k1:v1,k2:v2
lisi  child5,child6,child7,child8 k3:v3,k4:v4
hive表当中加载数据
hive (hive_explode)> load data local inpath '/export/servers/hivedatas/maparray' into table t3;
第四步:使⽤explode将hive当中数据拆开
将array当中的数据拆分开
hive (hive_explode)> SELECT explode(children) AS myChild FROM t3;
将map当中的数据拆分开
hive (hive_explode)> SELECT explode(address) AS (myMapKey, myMapValue) FROM t3;
0: jdbc:hive2://node03:10000> select explode(children) as mychild from t3;
+----------+--+
| mychild  |
+----------+--+
| child1  |
| child2  |
| child3  |
| child4  |
| child5  |
| child6  |
| child7  |
| child8  |
+----------+--+
8 rows selected (0.128 seconds)
0: jdbc:hive2://node03:10000> select name,explode(children) as mychild from t3;
Error: Error while compiling statement: FAILED: SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressio
0: jdbc:hive2://node03:10000> SELECT explode(address) AS (myMapKey, myMapValue) FROM t3;
+-----------+-------------+--+
| mymapkey  | mymapvalue  |
+-----------+-------------+--+
| k1        | v1          |
| k2        | v2          |
| k3        | v3          |
| k4        | v4          |
+-----------+-------------+--+
⼆、使⽤explode函数拆分json字符串数据
需求:现在有⼀些数据格式如下:
a:shandong,b:beijing,c:hebei|1,2,3,4,5,6,7,8,9|[{"source":"7fresh","monthSales":4900,"userCount":1900,"score":"9.9"}, {"source":"jd","monthSales":2090,"userCount":78981,"score":"9.8"},
{"source":"jdmart","monthSales":6987,"userCount":1600,"score":"9.0"}]
其中字段与字段之间的分隔符是 “ | ”
我们要解析得到所有的monthSales对应的值为以下这⼀列(⾏转列)
4900
2090
6987
第⼀步:创建hive表
hive (hive_explode)> create table explode_lateral_view
(`area` string,
`goods_id` string,
`sale_info` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS textfile;
第⼆步:准备数据并加载数据
准备数据如下
cd /export/servers/hivedatas
vim explode_json
a:shandong,b:beijing,c:hebei|1,2,3,4,5,6,7,8,9|[{"source":"7fresh","monthSales":4900,"userCount":1900,"score":"9.9"}, {"source":"jd","monthSales":2090,"userCount":78981,"score":"9.8"},
{"source":"jdmart","monthSales":6987,"userCount":1600,"score":"9.0"}]
加载数据到hive表当中去
hive (hive_explode)> load data local inpath '/export/servers/hivedatas/explode_json' overwrite into table explode_lateral_view;
第三步:使⽤explode拆分Array
hive (hive_explode)> select explode(split(goods_id,',')) as goods_id from explode_lateral_view;
第四步:使⽤explode拆解Map
hive (hive_explode)> select explode(split(area,',')) as area from explode_lateral_view;
第五步:拆解json字段
hive (hive_explode)> select explode(split(regexp_replace(regexp_replace(sale_info,'\\[\\{',''),'}]',''),'},\\{')) as  sale_info from explode_lateral_view;
然后我们想⽤get_json_object来获取key为monthSales的数据:
hive (hive_explode)> select get_json_object(explode(split(regexp_replace(regexp_replace(sale_info,'\\[\\{',''),'}]',''),'},\\ {')),'$.monthSales') as  sale_info from explode_lateral_view;
然后出现异常FAILED: SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressions
UDTF explode不能写在别的函数内
如果你这么写,想查两个字段,select explode(split(area,',')) as area,good_id from explode_lateral_view;
会报错FAILED: SemanticException 1:40 Only a single expression in the SELECT clause is supported with UDTF's. Error encountered near token 'good_id'
使⽤UDTF的时候,只⽀持⼀个字段,这时候就需要LATERAL VIEW出场了
三、配合LATERAL  VIEW使⽤
配合lateral view查询多个字段
hive (hive_explode)> select goods_id2,sale_info from explode_lateral_view LATERAL VIEW explode(split(goods_id,','))goods as goods_id2;
其中LATERAL VIEW explode(split(goods_id,','))goods相当于⼀个虚拟表,与原表explode_lateral_view笛卡尔积关联。
也可以多重使⽤
hive (hive_explode)> select goods_id2,sale_info,area2
from explode_lateral_view
LATERAL VIEW explode(split(goods_id,','))goods as goods_id2
LATERAL VIEW explode(split(area,','))area as area2;
也是三个表笛卡尔积的结果
最终,我们可以通过下⾯的句⼦,把这个json格式的⼀⾏数据,完全转换成⼆维表的⽅式展现:
hive (hive_explode)> select
get_json_object(concat('{',sale_info_1,'}'),'$.source') as source,
get_json_object(concat('{',sale_info_1,'}'),'$.monthSales') as monthSales,
get_json_object(concat('{',sale_info_1,'}'),'$.userCount') as userCount,
get_json_object(concat('{',sale_info_1,'}'),'$.score') as score
from explode_lateral_view
LATERAL VIEW explode(split(regexp_replace(regexp_replace(sale_info,'\\[\\{',''),'}]',''),'},\\{'))sale_info as sale_info_1;
总结:
Lateral View通常和UDTF⼀起出现,为了解决UDTF不允许在select字段的问题。
Multiple Lateral View可以实现类似笛卡尔乘积。
Outer关键字可以把不输出的UDTF的空结果,输出成NULL,防⽌丢失数据。
四、列转⾏
1.相关函数说明
CONCAT(string A/col, string B/col…):返回输⼊字符串连接后的结果,⽀持任意个输⼊字符串;
CONCAT_WS(separator, str1, str2,...):它是⼀个特殊形式的 CONCAT()。第⼀个参数剩余参数间的分隔符。分隔符可以是与剩余参数⼀样的字符串。如果分隔符是 NULL,返回值也将为 NULL。这个函数会跳过分隔符参数后的任何 NULL 和空字符串。分隔符将被加到被连接的字符串之间;
COLLECT_SET(col):函数只接受基本数据类型,它的主要作⽤是将某字段的值进⾏去重汇总,产⽣array类型字段。
2.数据准备
表6-6 数据准备
name constellation blood_type
孙悟空⽩⽺座A
⽼王射⼿座A
宋宋⽩⽺座B
猪⼋戒⽩⽺座A
凤射⼿座A
3.需求
把星座和⾎型⼀样的⼈归类到⼀起。结果如下:
射⼿座,A            ⽼王|凤
⽩⽺座,A            孙悟空|猪⼋戒
⽩⽺座,B            宋宋
4.创建本地,导⼊数据
node03服务器执⾏以下命令创建⽂件,注意数据使⽤\t进⾏分割
cd /export/servers/hivedatas
孙悟空⽩⽺座  A
java中split的用法
⽼王射⼿座  A
宋宋⽩⽺座  B
猪⼋戒⽩⽺座  A
凤射⼿座  A
5.创建hive表并导⼊数据
创建hive表并加载数据
hive (hive_explode)> create table person_info(
name string,
constellation string,
blood_type string)
row format delimited fields terminated by "\t";

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。