Hive表⽣成函数explode讲解
Hive中的表分析函数接受零个或多个输⼊,然后产⽣多列或多⾏输出。
explode函数以array类型数据输⼊,然后对数组中的数据进⾏迭代,返回多⾏结果,⼀⾏⼀个数组元素值
ARRAY函数是将⼀列输⼊转换成⼀个数组输出。
h ive (jimdb)> SELECT ARRAY(1,2,3) FROM dual;
OK
_c0
[1,2,3]
Time taken: 0.448 seconds, Fetched: 1 row(s)
SELECT explode(array(1,2,3)) AS element;
hive (jimdb)> SELECT explode(array(1,2,3)) AS element;
OK
element
1
2
3
Time taken: 0.327 seconds, Fetched: 3 row(s)
创建⼀张测试表单:
CREATE TABLE udtf_test(name STRING,subordinates ARRAY<STRING>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY ',';
hive (jimdb)> select * from udtf_test;
OK
udtf_test.name udtf_test.subordinates
jim5 ["james","datacloase"]
jim4 ["james","datacloase"]
jim3 ["james","datacloase"]
jim2 ["james","datacloase"]
jim ["james","datacloase"]
Time taken: 0.348 seconds, Fetched: 5 row(s)
我执⾏下⾯这条语句,希望将字段subordinates拆分开,新⽣成⼀列,但是语句报错:
select name,explode(subordinates) from udtf_test;
hive (jimdb)> select name,explode(subordinates) from udtf_test;hive 字符串转数组
FAILED: SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressions
Hive的表⽣成函数只是⽣成了⼀种数据的展⽰⽅式,⽽⽆法在表中产⽣⼀个其他的列。
因此这块需要使⽤LATERAL VIEW功能来进⾏处理。LATERAL VIEW将explode⽣成的结果当做⼀个视图来处理。
SELECT name, sub
FROM udtf_test
LATERAL VIEW explode(subordinates) subView AS sub;
在这⾥LATERAL VIEW 是将 explode结果转换成⼀个视图subView,在视图中的单列列名定义为sub,然后在查询的时候引⽤这个列名就能够查到。
hive (jimdb)> SELECT name, sub
> FROM udtf_test
> LATERAL VIEW explode(subordinates) subView AS sub;
OK
name sub
jim5 james
jim5 datacloase
jim4 james
jim4 datacloase
jim3 james
jim3 datacloase
jim2 james
jim2 datacloase
jim james
jim datacloase
Time taken: 0.399 seconds, Fetched: 10 row(s)
创建⼀张测试表:
drop table test1;
create table test1(name string,phonenumber string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t';
--需求是过滤掉该表中电话号码中0-9的某个数字在电话号码中出现6次及以上的号码,然后将正常的号码返回。hive (jimdb)> select * from test1;
OK
test1.name test1.phonenumber
'jim he' '181********'
'xiaosong' '183********'
'jingxianghua' '181********'
'donghualing' '17191919999'
执⾏语句如下:
SELECT c.name,c.phonenumber
FROM
(SELECT dd.name,dd.phonenumber,MAX(dd)
FROM (SELECT d.name,d.phonenumber,d.m, COUNT(*) cn
FROM (SELECT name,phonenumber,m FROM test1 LATERAL VIEW explode(split(phonenumber,'')) n AS m) d GROUP BY d.name,d.phonenumber,d.m) dd
GROUP BY dd.name,dd.phonenumber HAVING MAX(dd) <=5) c;
hive (jimdb)> SELECT c.name,c.phonenumber
> FROM
> (SELECT dd.name,dd.phonenumber,MAX(dd)
> FROM (SELECT d.name,d.phonenumber,d.m, COUNT(*) cn
> FROM (SELECT name,phonenumber,m FROM test1 LATERAL VIEW explode(split(phonenumber,'')) n AS m) d > GROUP BY d.name,d.phonenumber,d.m) dd
> GROUP BY dd.name,dd.phonenumber HAVING MAX(dd) <=5) c;
Query ID = hadoop_20180611200632_14d3d30b-e64f-4aee-a7ca-fffa66049890
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
educers.ducer=<number>
In order to limit the maximum number of reducers:
educers.max=<number>
In order to set a constant number of reducers:
set duces=<number>
Job running in-process (local Hadoop)
2018-06-11 20:06:35,732 Stage-1 map = 100%, reduce = 100%
Ended Job = job_local1118441439_0004
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 3004 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
c.name c.phonenumber
'jim he' '181********'
'xiaosong' '183********'
Time taken: 2.872 seconds, Fetched: 2 row(s)

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。