python请求并解析json数据
SparkUDF实践之json解析
Spark UDF实践之json解析
我们⼀般使⽤spark处理json字段时,通常使⽤schema来约束json的字段,但是json数据中也会有⼀些需要特殊处理的字段需要获取,那么我们就需要通过UDF来进⾏处理了。
下⾯解析⼀个json的数据做⼀个⽰例:
json数据源:
{"final_score":16, "risk_items":[{"item_id":3403925, "item_name":"7天内申请⼈在多个平台申请借款", "risk_level":"high", "group":"多平台借贷申请检测", "item _detail":{"discredit_times":null, "overdue_details":null, "platform_count":2, "court_details":null, "fraud_type":null, "platform_detail":["⼀般消费分期平台:1", "P2 P⽹贷:1"], "high_risk_areas":null, "hit_list_datas":null, "frequency_detail_list":null}},{"item_id":3403927,"item_name":"1个⽉内申请⼈在多个平台申请借款","ris k_level":"medium","group":"多平台借贷申请检测","item_detail":{"discredit_times":null,"overdue_details":null,"platform_count":2,"court_details":null,"fraud_typ e":null,"platform_detail":["⼀般消费分期平台:1","P2P⽹贷:1"],"high_risk_areas":null,"hit_list_datas":null,"
oracle数据库官方安装文档frequency_detail_list":null}},{"item_id":3403929,"item _name":"3个⽉内申请⼈在多个平台申请借款","risk_level":"medium","group":"多平台借贷申请检测","item_detail":{"discredit_times":null,"overdue_details":null, "platform_count":2,"court_details":null,"fraud_type":null,"platform_detail":["⼀般消费分期平台:1","P2P⽹贷:1"],"high_risk_areas":null,"hit_list_datas":null,"freq uency_detail_list":null}},{"item_id":3403931,"item_name":"6个⽉内申请⼈在多个平台申请借款","risk_level":"medium","group":"多平台借贷申请检测","item_de tail":{"discredit_times":null,"overdue_details":null,"platform_count":2,"court_details":null,"fraud_type":null,"platform_detail":["⼀般消费分期平台:1","P2P⽹贷:1 "],"high_risk_areas":null,"hit_list_datas":null,"frequency_detail_list":null}},{"item_id":3403935,"item_name":"18个⽉内申请⼈在多个平台申请借款","risk_level": "low","group":"多平台借贷申请检测","item_detail":{"discredit_times":null,"overdue_details":null,"platform_count":2,"court_details":null,"fraud_type":null,"platfo rm_detail":["⼀般消费分期平台:1","P2P⽹贷:1"],"high_risk_areas":null,"hit_list_datas":null,"frequency_detail_list":null}},{"item_id":3403937,"item_name":"24个⽉内申请⼈在多个平台申请借款","risk_level":"low","group":"多平台借贷申请检测","item_detail":{"discredit_times":null,"overdue_details":null,"platform_coun t":2,"co
urt_details":null,"fraud_type":null,"platform_detail":["⼀般消费分期平台:1","P2P⽹贷:1"],"high_risk_areas":null,"hit_list_datas":null,"frequency_detail_li st":null}},{"item_id":3403939,"item_name":"60个⽉以上申请⼈在多个平台申请借款","risk_level":"low","group":"多平台借贷申请检测","item_detail":{"discredit_ti mes":null,"overdue_details":null,"platform_count":2,"court_details":null,"fraud_type":null,"platform_detail":["⼀般消费分期平台:1","P2P⽹贷:1"],"high_risk_are as":null,"hit_list_datas":null,"frequency_detail_list":null}}],"final_decision":"Accept","report_time":1495377281000,"success":true,"report_id":"ER2017052122 344113605405","apply_time":1495377281000}
我们需要解析出,item_name分别为:7天内申请⼈在多个平台申请借款,1个⽉内申请⼈在多个平台申请借款,3个⽉内申请⼈在多个平台申请借款,6个⽉内申请⼈在多个平台申请借款,对应的platform_count的值。
下⾯就直接上代码了:
import os
from pyspark.sql import SparkSession
from pes import StructType, StructField, StringType
if __name__ == '__main__':
yml船# 定义UDF函数,字段的获取规则
def parse_1(risk_items, item_name):
for i in risk_items:
print(i)
print(type(i))
field读音发音
try:
if i.item_name == item_name:
return i.item_detail.platform_count
except:
return ""
# 解决⼀个python环境的bug,本地默认是python3,这⾥⽤的是python2.7版本
spark = SparkSession \
.builder \
.appName("application") \
.master("local") \
.getOrCreate()
# 注册UDF函数,sparksql函数名为td_parse1,定义的func名parse_1
ister("td_parse1", parse_1)
# 读取json数据
df = ad.json("1.json")
# 创建临时表
# 定义sparksql
spring中注解的底层实现原理
resDf = spark.sql(
"""
select
final_score as td_final_score,
td_parse1(risk_items,'7天内申请⼈在多个平台申请借款') as td_platform_count_7d,        td_parse1(risk_items,'1个⽉内申请⼈在多个平台申请借款') as td_platform_count_1m,        td_parse1(risk_items,'3个⽉内申请⼈在多个平台申请借款') as td_platform_count_3m,        td_parse1(risk_items,'6个⽉内申请⼈在多个平台申请借款') as td_platform_count_6m        from tmp""")
# 展⽰数据
resDf.show()
spark.stop()
解析出来的结果如图:js怎么学扎实

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。