MongoDb数组操作-unwind解包、group分组统计、sort排序⼀,问题描述
【使⽤ unwind 操作符 “解包” Document ⾥⾯的Array中的每个元素,然后使⽤ group 分组统计,最后使⽤ sort 对分组结果排序】
从 images.json ⽂件中导⼊数据到MongoDB服务器
mongoimport --drop -d test -c images images.json
其中Document的⽰例如下:
> db.images.find()
{ "_id" : 3, "height" : 480, "width" : 640, "tags" : [ "kittens", "travel" ] }
{ "_id" : 1, "height" : 480, "width" : 640, "tags" : [ "cats", "sunrises", "kittens", "travel", "vacation", "work" ] }
{ "_id" : 0, "height" : 480, "width" : 640, "tags" : [ "dogs", "work" ] }
{ "_id" : 6, "height" : 480, "width" : 640, "tags" : [ "work" ] }
{ "_id" : 4, "height" : 480, "width" : 640, "tags" : [ "dogs", "sunrises", "kittens", "travel" ] }
{ "_id" : 5, "height" : 480, "width" : 640, "tags" : [ "dogs", "cats", "sunrises", "kittens", "work" ] }
{ "_id" : 7, "height" : 480, "width" : 640, "tags" : [ "dogs", "sunrises" ] }
{ "_id" : 8, "height" : 480, "width" : 640, "tags" : [ "dogs", "cats", "sunrises", "kittens", "travel" ] }
现在要统计: 所有Document中的 tags 数组⾥⾯的每个元素 出现的次数。即:"kittens"出现了多少次?"travel"出现了多少次?"dogs"出现了多少次?……
⼆,实现步骤
使⽤MongoDB的Aggregate操作进⾏实现
①使⽤ unwind 分解 tags 数组,得到的结果如下:
> db.images.aggregate(
... [
... {$unwind:"$tags"}
... ])
{ "_id" : 3, "height" : 480, "width" : 640, "tags" : "kittens" }
{ "_id" : 3, "height" : 480, "width" : 640, "tags" : "travel" }
{ "_id" : 1, "height" : 480, "width" : 640, "tags" : "cats" }
{ "_id" : 1, "height" : 480, "width" : 640, "tags" : "sunrises" }
{ "_id" : 1, "height" : 480, "width" : 640, "tags" : "kittens" }
{ "_id" : 1, "height" : 480, "width" : 640, "tags" : "travel" }
{ "_id" : 1, "height" : 480, "width" : 640, "tags" : "vacation" }
{ "_id" : 1, "height" : 480, "width" : 640, "tags" : "work" }
{ "_id" : 0, "height" : 480, "width" : 640, "tags" : "dogs" }
{ "_id" : 0, "height" : 480, "width" : 640, "tags" : "work" }
{ "_id" : 6, "height" : 480, "width" : 640, "tags" : "work" }
{ "_id" : 4, "height" : 480, "width" : 640, "tags" : "dogs" }
{ "_id" : 4, "height" : 480, "width" : 640, "tags" : "sunrises" }
.....
.....
②将分解后的每个 tag 进⾏ group 操作
对于group操作⽽⾔,_id 指定了 分组 的字段(对哪个字段进⾏ group by 操作),分组操作之后⽣成的结果由 num_of_tag 字段标识
> db.images.aggregate(
... [
... {$unwind:"$tags"},
... {$group:{_id:"$tags",num_of_tag:{$sum:1}}}
... ]
... )
{ "_id" : "dogs", "num_of_tag" : 49921 }
{ "_id" : "work", "num_of_tag" : 50070 }
{ "_id" : "vacation", "num_of_tag" : 50036 }
{ "_id" : "travel", "num_of_tag" : 49977 }
{ "_id" : "kittens", "num_of_tag" : 49932 }
{ "_id" : "sunrises", "num_of_tag" : 49887 }
{ "_id" : "cats", "num_of_tag" : 49772 }
③使⽤ project 去掉不感兴趣的 _id 字段(其实这⾥是将 _id 字段名 替换为 tags 字段名)(这⼀步可忽略)
project操作,_id:0 表⽰去掉_id 字段;tags:"$_id",将 _id 字段值 使⽤tags 字段标识;num_of_tag:1 保留 num_of_tag 字段
> db.images.aggregate( [ {$unwind:"$tags"},{$group:{_id:"$tags",num_of_tag:{$sum:1}}},{$project:{_id:0,tags:"$_id",num_of_tag:1}} ])
{ "num_of_tag" : 49921, "tags" : "dogs" }
{ "num_of_tag" : 50070, "tags" : "work" }
{ "num_of_tag" : 50036, "tags" : "vacation" }
{ "num_of_tag" : 49977, "tags" : "travel" }
{ "num_of_tag" : 49932, "tags" : "kittens" }
{ "num_of_tag" : 49887, "tags" : "sunrises" }
{ "num_of_tag" : 49772, "tags" : "cats" }groupby分组
④使⽤ sort 对 num_of_tag 字段排序
> db.images.aggregate( [ {$unwind:"$tags"},{$group:{_id:"$tags",num_of_tag:{$sum:1}}},{$project:{_id:0,tags:"$_id",num_of_tag:1}},{$sort:{num_of_tag:-1 { "num_of_tag" : 50070, "tags" : "work" }
{ "num_of_tag" : 50036, "tags" : "vacation" }
{ "num_of_tag" : 49977, "tags" : "travel" }
{ "num_of_tag" : 49932, "tags" : "kittens" }
{ "num_of_tag" : 49921, "tags" : "dogs" }
{ "num_of_tag" : 49887, "tags" : "sunrises" }
{ "num_of_tag" : 49772, "tags" : "cats" }
三,总结
本⽂是MongoDB University M101课程 For Java Developers中的⼀次作业。结合Google搜索和MongoDB的官⽅⽂档,很容易就能实现MongoDB的各种组合查询。
相关MongoDB⽂章:
--》项⽬中的应⽤:
【查询排序】
var param = "2019-01-22 00:00:00";
"startTime":{
"$gt":param
},
"timeConsumed":{
"$gt":2000
}
}).sort({timeConsumed:-1})
var param = "2019-01-22 00:00:00";
"startTime":{
"$gt":param
},
"timeConsumed":{
"$gt":2000
}
},{"type":1
,"serverHost":1
,"className":1
,
"methodName":1
,"startTime":1
,"endTime":1
,"args":1
,"timeConsumed":1}).sort({timeConsumed:-1}).limit(50)
【统计排序】
var startTime = "2019-01-24 00:00:00",
endTime  = "2019-01-24 23:59:59";
{$match: {
"startTime":{"$gte":startTime}
,
"endTime":{"$lte":endTime}
,"timeConsumed":{"$gt":2000}
}
}
,{$group: {
_id: {
"className": "$className",
"methodName": "$methodName",
"subscriber_id": "$subscriber_id"
}
,cnt: { "$sum": 1 }
}
}
,{$sort:{cnt:-1}}
])

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。