Hive中的lateralview、explode、reflect、行转列、列转行、窗口函...--688IT编程网

Hive中的lateralview、explode、reflect、⾏转列、列转⾏、窗

⼝函。。。

⽬录

⼀、使⽤explode函数将hive表中的Map和Array字段数据进⾏拆分

lateral view⽤于和split、explode等UDTF⼀起使⽤的，能将⼀⾏数据拆分成多⾏数据，在此基础上可以对拆分的数据进⾏聚合，lateral view ⾸先为原始表的每⾏调⽤UDTF，UDTF会把⼀⾏拆分成⼀⾏或者多⾏，lateral view再把结果组合，产⽣⼀个⽀持别名表的虚拟表。

其中explode还可以⽤于将hive⼀列中复杂的array或者map结构拆分成多⾏

需求：现在有数据格式如下

zhangsan child1,child2,child3,child4 k1:v1,k2:v2

lisi child5,child6,child7,child8 k3:v3,k4:v4

字段之间使⽤\t分割，需求将所有的child进⾏拆开成为⼀列

+----------+--+

| mychild |

+----------+--+

| child1 |

| child2 |

| child3 |

| child4 |

| child5 |

| child6 |

| child7 |

| child8 |

+----------+--+

将map的key和value也进⾏拆开，成为如下结果

+-------------------+---------------+

| mymapkey | mymapvalue |

+----------------+------------------+

| k1 | v1 |

| k2 | v2 |

| k3 | v3 |

| k4 | v4 |

+-------------------+---------------+

第⼀步：创建hive数据库

创建hive数据库

hive (default)> create database hive_explode;

hive (default)> use hive_explode;

第⼆步：创建hive表，然后使⽤explode拆分map和array

hive (hive_explode)> create table t3(name string,children array<string>,address Map<string,string>)

row format delimited fields terminated by '\t'

collection items terminated by ','

map keys terminated by ':' stored as textFile;

第三步：加载数据

node03执⾏以下命令创建表数据⽂件

mkdir -p /export/servers/hivedatas/

cd /export/servers/hivedatas/

vim maparray

zhangsan child1,child2,child3,child4 k1:v1,k2:v2

lisi child5,child6,child7,child8 k3:v3,k4:v4

hive表当中加载数据

hive (hive_explode)> load data local inpath '/export/servers/hivedatas/maparray' into table t3;

第四步：使⽤explode将hive当中数据拆开

将array当中的数据拆分开

hive (hive_explode)> SELECT explode(children) AS myChild FROM t3;

将map当中的数据拆分开

hive (hive_explode)> SELECT explode(address) AS (myMapKey, myMapValue) FROM t3;

0: jdbc:hive2://node03:10000> select explode(children) as mychild from t3;

+----------+--+

| mychild |

+----------+--+

| child1 |

| child2 |

| child3 |

| child4 |

| child5 |

| child6 |

| child7 |

| child8 |

+----------+--+

8 rows selected (0.128 seconds)

0: jdbc:hive2://node03:10000> select name,explode(children) as mychild from t3;

Error: Error while compiling statement: FAILED: SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressio

0: jdbc:hive2://node03:10000> SELECT explode(address) AS (myMapKey, myMapValue) FROM t3;

+-----------+-------------+--+

| mymapkey | mymapvalue |

+-----------+-------------+--+

| k1 | v1 |

| k2 | v2 |

| k3 | v3 |

| k4 | v4 |

+-----------+-------------+--+

⼆、使⽤explode函数拆分json字符串数据

需求：现在有⼀些数据格式如下：

a:shandong,b:beijing,c:hebei|1,2,3,4,5,6,7,8,9|[{"source":"7fresh","monthSales":4900,"userCount":1900,"score":"9.9"}, {"source":"jd","monthSales":2090,"userCount":78981,"score":"9.8"},

{"source":"jdmart","monthSales":6987,"userCount":1600,"score":"9.0"}]

其中字段与字段之间的分隔符是 “ | ”

我们要解析得到所有的monthSales对应的值为以下这⼀列（⾏转列）

4900

2090

6987

第⼀步：创建hive表

hive (hive_explode)> create table explode_lateral_view

(`area` string,

`goods_id` string,

`sale_info` string)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY '|'

STORED AS textfile;

第⼆步：准备数据并加载数据

准备数据如下

cd /export/servers/hivedatas

vim explode_json

a:shandong,b:beijing,c:hebei|1,2,3,4,5,6,7,8,9|[{"source":"7fresh","monthSales":4900,"userCount":1900,"score":"9.9"}, {"source":"jd","monthSales":2090,"userCount":78981,"score":"9.8"},

{"source":"jdmart","monthSales":6987,"userCount":1600,"score":"9.0"}]

加载数据到hive表当中去

hive (hive_explode)> load data local inpath '/export/servers/hivedatas/explode_json' overwrite into table explode_lateral_view;

第三步：使⽤explode拆分Array

hive (hive_explode)> select explode(split(goods_id,',')) as goods_id from explode_lateral_view;

第四步：使⽤explode拆解Map

hive (hive_explode)> select explode(split(area,',')) as area from explode_lateral_view;

第五步：拆解json字段

hive (hive_explode)> select explode(split(regexp_replace(regexp_replace(sale_info,'\\[\\{',''),'}]',''),'},\\{')) as sale_info from explode_lateral_view;

然后我们想⽤get_json_object来获取key为monthSales的数据：

hive (hive_explode)> select get_json_object(explode(split(regexp_replace(regexp_replace(sale_info,'\\[\\{',''),'}]',''),'},\\ {')),'$.monthSales') as sale_info from explode_lateral_view;

然后出现异常FAILED: SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressions

UDTF explode不能写在别的函数内

如果你这么写，想查两个字段，select explode(split(area,',')) as area,good_id from explode_lateral_view;

会报错FAILED: SemanticException 1:40 Only a single expression in the SELECT clause is supported with UDTF's. Error encountered near token 'good_id'

使⽤UDTF的时候，只⽀持⼀个字段，这时候就需要LATERAL VIEW出场了

三、配合LATERAL VIEW使⽤

配合lateral view查询多个字段

hive (hive_explode)> select goods_id2,sale_info from explode_lateral_view LATERAL VIEW explode(split(goods_id,','))goods as goods_id2;

其中LATERAL VIEW explode(split(goods_id,','))goods相当于⼀个虚拟表，与原表explode_lateral_view笛卡尔积关联。

也可以多重使⽤

hive (hive_explode)> select goods_id2,sale_info,area2

from explode_lateral_view

LATERAL VIEW explode(split(goods_id,','))goods as goods_id2

LATERAL VIEW explode(split(area,','))area as area2;

也是三个表笛卡尔积的结果

最终，我们可以通过下⾯的句⼦，把这个json格式的⼀⾏数据，完全转换成⼆维表的⽅式展现：

hive (hive_explode)> select

get_json_object(concat('{',sale_info_1,'}'),'$.source') as source,

get_json_object(concat('{',sale_info_1,'}'),'$.monthSales') as monthSales,

get_json_object(concat('{',sale_info_1,'}'),'$.userCount') as userCount,

get_json_object(concat('{',sale_info_1,'}'),'$.score') as score

from explode_lateral_view

LATERAL VIEW explode(split(regexp_replace(regexp_replace(sale_info,'\\[\\{',''),'}]',''),'},\\{'))sale_info as sale_info_1;

总结：

Lateral View通常和UDTF⼀起出现，为了解决UDTF不允许在select字段的问题。

Multiple Lateral View可以实现类似笛卡尔乘积。

Outer关键字可以把不输出的UDTF的空结果，输出成NULL，防⽌丢失数据。

四、列转⾏

1．相关函数说明

CONCAT(string A/col, string B/col…)：返回输⼊字符串连接后的结果，⽀持任意个输⼊字符串;

CONCAT_WS(separator, str1, str2,...)：它是⼀个特殊形式的 CONCAT()。第⼀个参数剩余参数间的分隔符。分隔符可以是与剩余参数⼀样的字符串。如果分隔符是 NULL，返回值也将为 NULL。这个函数会跳过分隔符参数后的任何 NULL 和空字符串。分隔符将被加到被连接的字符串之间;

COLLECT_SET(col)：函数只接受基本数据类型，它的主要作⽤是将某字段的值进⾏去重汇总，产⽣array类型字段。

2．数据准备

表6-6 数据准备

name constellation blood_type

孙悟空⽩⽺座A

⽼王射⼿座A

宋宋⽩⽺座B

猪⼋戒⽩⽺座A

凤射⼿座A

3．需求

把星座和⾎型⼀样的⼈归类到⼀起。结果如下：

射⼿座,A ⽼王|凤

⽩⽺座,A 孙悟空|猪⼋戒

⽩⽺座,B 宋宋

4．创建本地，导⼊数据

node03服务器执⾏以下命令创建⽂件，注意数据使⽤\t进⾏分割

cd /export/servers/hivedatas

孙悟空⽩⽺座 A

java中split的用法

⽼王射⼿座 A

宋宋⽩⽺座 B

猪⼋戒⽩⽺座 A

凤射⼿座 A

5．创建hive表并导⼊数据

创建hive表并加载数据

hive (hive_explode)> create table person_info(

name string,

constellation string,

blood_type string)

row format delimited fields terminated by "\t";

688IT编程网

Hive中的lateralview、explode、reflect、行转列、列转行、窗口函...

发表评论

推荐文章

应用程序的安全检测方法、装置、电子设备和存储介质

nginx map用法正则

VBA之正则表达式(1)--基础篇

Prometheus监控学习笔记之初识PromQL

关于PHP中的webshell

热门文章

m函数数字提取

jest断言方法大全

中兴ZXSEC US 管理员手册

keras系列(一):参数设置

Qt从QString中提取出数字

element input 金额千分位格式化

freemaker 参数解析正则

C#正则验证数字

form表单验证正则

scanf正则表达式用法

grafana value的正则表达式

Android平台浮点数运算应用

js-(JS正则表达式验证数字)

判断Python输入是否是整数,字符,或浮点数

c语言 sscanf 正则规则

从文本中提取数值技巧

js将整数转换成两位浮点数的方法

vue正则限制浮点数

8到20的结尾的正则

shell 正则表达式最后一行

最新文章

应用程序的安全检测方法、装置、电子设备和存储介质

VBA之正则表达式(1)--基础篇

代码编辑的辅助方法、装置及电子设备

SHELL查字符串中包含字符的命令

String方法中replace和replaceAll的区别详解(源码分析)

双字节符号正则

标签列表

688IT编程网

Hive中的lateralview、explode、reflect、行转列、列转行、窗口函...

发表评论

推荐文章

应用程序的安全检测方法、装置、电子设备和存储介质

nginx map用法 正则

VBA之正则表达式(1)--基础篇

Prometheus监控学习笔记之初识PromQL

关于PHP中的webshell

热门文章

m函数数字提取

jest断言方法大全

中兴ZXSEC US 管理员手册

keras系列(一):参数设置

Qt从QString中提取出数字

element input 金额千分位格式化

freemaker 参数解析正则

C#正则验证数字

form表单验证正则

scanf正则表达式用法

grafana value的正则表达式

Android平台浮点数运算应用

js-(JS正则表达式验证数字)

判断Python输入是否是整数,字符,或浮点数

c语言 sscanf 正则规则

从文本中提取数值技巧

js将整数转换成两位浮点数的方法

vue正则限制浮点数

8到20的结尾的正则

shell 正则表达式 最后一行

最新文章

应用程序的安全检测方法、装置、电子设备和存储介质

VBA之正则表达式(1)--基础篇

代码编辑的辅助方法、装置及电子设备

SHELL查字符串中包含字符的命令

String方法中replace和replaceAll的区别详解(源码分析)

双字节符号正则

标签列表

nginx map用法正则

shell 正则表达式最后一行