hive如何使⽤json格式的数据
1. 查看当前hive⽀持的json函数
show functions like "*json*";
OK
get_json_object
json_tuple
Time taken: 0.006 seconds, Fetched: 2 row(s)
get_json_object
describe function get_json_object;
OK
get_json_object(json_txt, path) - Extract a json object from path
解析json的字符串json_txt,返回path指定的内容。如果输⼊的json字符串⽆效,那么返回NULL。
例如:
select  get_json_object('{"store":{"fruit":\[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}],"bicycle":{"price":19.95,"color":"red"}}, "email":"amy@only_for _json_udf_test","owner":"amy" } ','$.ownes');
OK
NULL
Time taken: 0.219 seconds, Fetched: 1 row(s)
没有ownes,返回值是null
select  get_json_object('{"store":{"fruit":\[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}],"bicycle":{"price":19.95,"color":"red"}}, "email":"amy@only_for _json_udf_test","owner":"amy" } ','$.owner');
OK
amy
通过$.获取对应key的value 值
select  get_json_object('{"store":{"fruit":\[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}],"bicycle":{"price":19.95,"color":"red"}}, "email":"amy@only_for _json_udf_test","owner":"amy" } ','$.store.fruit');
OK
[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}]
select  get_json_object('{"store":{"fruit":\[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}],"bicycle":{"price":19.95,"color":"red"}}, "email":"amy@only_for _json_udf_test","owner":"amy" } ','$.store.fruit.weight');
OK
[8,9]
效果如上,如果是嵌套json,也通过.获取,如果并列结构获取的值将是list的形式
json_tuple
weight的几种形式
desc function json_tuple;
OK
json_tuple(jsonStr, p1, p2, ..., pn) - like get_json_object, but it takes multiple names and return a tuple. All the input parameters and output column types ar e string.
参数为⼀组键p1,p2……和JSON字符串,返回值的元组。该⽅法可以在⼀次调⽤中输⼊多个键
select json_tuple('{"store":{"fruit":\[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}],"bicycle":{"price":19.95,"color":"red"}}, "email":"amy@only_for_json_ udf_test","owner":"amy" }',"store","email","owner")
OK
{"fruit":[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}],"bicycle":{"price":19.95,"color":"red"}}      amy@only_for_json_udf_test  amy
可以通过 “store”,“email”,“owner” 获取三个key 对应的value值,相⽐get_json_object显得更为⾼效率
select json_tuple('{"store":{"fruit":\[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}],"bicycle":{"price":19.95,"color":"red"}}, "email":"amy@only_for_json_ udf_test","owner":"amy" }',"store.fruit","email","owner")
> ;
OK
NULL    amy@only_for_json_udf_test  amy
可以看出。不能通过.获取件套数据对应的值,如果key不存在,将返回null。
要获取json各列的数据的⽅法
raw001 {"age":20,"gender":"M","weight":"56"}
raw002 {"age":23,"gender":"M","weight":"70"}
raw003 {"age":34,"gender":"F","weight":"50"}
---------------------
select t.cl1 , get_json_object(t.cl2json,'$.age'), get_json_object(t.cl2json,'$.gender') from tmp_json_test t ;
select t1.cl1, t2.* from tmp_json_test t1 lateral view json_tuple(t1.cl2json, 'age', 'gender') t2 as c1, c2;
<property>
<name>hive.aux.jars.path</name>
<value>auxlib/json-serde-1.3.8-jar-with-dependencies.jar</value>
</property>
注意之前版本有bug,在使⽤hiveserver2的时候解析会实效,有类型转化的错误。
注意建表的时候 需要指定序列化对应的⽅式org.openx.data.jsonserde.JsonSerDe
ROW FORMAT SERDE
'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
'ignore.malformed.json'='true',
'mapping.position'='where',
'mapping.ts'='timestamp')
‘ignore.malformed.json’=‘true’ 忽略坏数据为null,避免查询报错。
‘mapping.position’=‘where’, 将where字段列名映射为position字段,避免关键词作为列名使⽤不⽅便的问题,
‘mapping.ts’=‘timestamp’ 将timestamp映射为ts,作⽤和where相同,注意hive是⼤⼩写不敏感的,但是列名称必须和json数据结构中的列名相同,只是忽略⼤⼩写,如果不存在,则返回null值。
具体请参考
对应map这种 在union的时候,如果字段为null 如何拼接呢
map<string,string>
str_to_map(text[, delimiter1, delimiter2])
Splits text into key-value pairs using two delimiters. Delimiter1 separates text into K-V pairs, and Delimiter2 splits each K-V pair. Default delimiters are ‘,’ for delimiter1 and ‘=’ for delimiter2.

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。