Logstash解析嵌套JSON格式数据常见时间操作⽬录
嵌套Json格式数据
JSON格式⼀
有如下JSON⽇志(position下是⼀个JSON)
{
"RequestTime":1637737587605,
"timestamp":"2021-11-24T15:06:42.681Z",
"position":{
"LogType":"请求⽇志",
"TopDirectory":"stream_ad_v1",
"RequestIp":"127.0.0.1"
}
}
将其解析为:
{
"RequestIp" => "127.0.0.1",
"TopDirectory" => "stream_ad_v1",
"timestamp" => "2021-11-24T15:06:42.681Z",
"RequestTime" => 1637737587605,
"LogType" => "请求⽇志"
}
思路:
1、新建⼀个新的字段(position_su),并将position赋值给新字段
2、对新字段进⾏JSON解析
3、移除多余字段(可选)
详细f:
codec => json {
charset => ["UTF-8"]
}
}
}
filter{
mutate {
add_field => {"position_su" => "%{position}"} #先新建⼀个新的字段,并将position赋值给它 }
json {
source => "position_su" # 再对该字段进⾏解析
remove_field => ["position_su","position"] # 最后移出多余字段(可选)
}
}
output{
stdout { codec => rubydebug }
}
JSON格式⼆
有如以下JSON格式(position下是⼀个JSON数组,数组元素为1)
{
"RequestTime":1637737587605,
"timestamp":"2021-11-24T15:06:42.681Z",
"position":[
{
"LogType":"请求⽇志",
"TopDirectory":"stream_ad_v1",
时间正则表达式java"RequestIp":"127.0.0.1"
}
]
}
如果不做改变,沿⽤上述⽅法,将解析失败,如图:
此时可以先不进⾏JSON转换,将“[]”替换之后,在进⾏JSON转换,
}
filter{
mutate {
gsub => [ "message","\[","" ]
gsub => [ "message","\]","" ]
}
json { source => "message" }
mutate {
add_field => {"position_su" => "%{position}"} #先新建⼀个新的字段,并将position赋值给它
}
json {
source => "position_su" # 再对该字段进⾏解析
remove_field => ["position_su","position"] # 最后移出多余字段(可选)
}
}
output{
stdout { codec => rubydebug }
}
JSON格式三
有如下JSON数据格式(position下是⼀个JSON数组,数组元素为n)
{
"RequestTime":1637737587605,
"timestamp":"2021-11-24T15:06:42.681Z",
"position":[
{
"LogType":"请求⽇志",
"TopDirectory":"stream_ad_v1",
"RequestIp":"127.0.0.1"
},
{
"LogType":"请求⽇志",
"TopDirectory":"stream_ad_v2",
"RequestIp":"127.0.0.1"
},
{
"LogType":"请求⽇志",
"TopDirectory":"stream_ad_v3",
"RequestIp":"127.0.0.1"
}
]
}
此时将“[]”替换就不能解决问题,观察此JSON经过logstash处理后的输出格式,如图:
我们发现,logstash⽀持数组元素按索引位置获取元素,因此,f可以如下改造:
codec =>json {
charset => ["UTF-8"]
}
}
}
filter{
mutate {
add_field => {"position_json" => "%{[position][0]}"}
}
mutate {
add_field => {"position_su" => "%{position_json}"} #先新建⼀个新的字段,并将position赋值给它 }
json {
source => "position_su" # 再对该字段进⾏解析
remove_field => ["position_su","position"] # 最后移出多余字段(可选)
}
}
output{
stdout { codec => rubydebug }
}
执⾏结果如图:
备注: 如果遇到以下复杂的版本,LogType下嵌套⼀个或多个JSON数组。
{
"RequestTime":1637737587605,
"timestamp":"2021-11-24T15:06:42.681Z",
"position":[
{
"LogType":[
{
"a":"请求⽇志a"
},
{
"b":"请求⽇志b"
},
{
"c":"请求⽇志c"
}
],
"TopDirectory":"stream_ad_v1",
"RequestIp":"127.0.0.1"
},
{
"LogType":"请求⽇志",
"TopDirectory":"stream_ad_v2",
"RequestIp":"127.0.0.1"
},
{
"LogType":"请求⽇志",
"TopDirectory":"stream_ad_v3",
"RequestIp":"127.0.0.1"
}
]
}
⽬前我能想到的⽅法是抽丝剥茧,依次向下取数据。但实际⼯作中,这样的需求是不合理的,类似的操作应该放到ElasticSearch中来完成。
组件解析:
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论