Mongodb数据统计分析:groupby、count(distinctuserId)
最近在做mongodb数据统计查询,需求:统计⼀段时间内每天的分享次数和分享⼈数? 查了众多资料,居然未到好的⽅案,最终还是⾃
⼰写函数解决了,现分享出来(若有好的⽅法请指出):
1. 表t_share_log结构(习惯这么叫,虽mongodb中为collection):
字段名类型说明
id string记录id
gmtCreate date创建时间
userId int⽤户id
url string分享地址
2. 那mongodb中如何写呢?我们先看mysql写法:
select
date_format(gmtCreate,'%Y%m%d') day,
count(1) shareCount,
count(distinct userId) shareUserCount
from t_share_log
where gmtCreate between str_to_date(20161001, '%Y%m%d') and str_to_date(20161231, '%Y%m%d');
那么在mongodb中如何表达呢? ⼏经调试,先通过定义keyf的function格式化group by的⽇期值,再通过定义reduce:function函数借助userIdMap去重userId(相当于dintinct userId),最后完整的命令如下:
mongodb命令⾏:
db.runCommand({group:mongodb和mysql结合
{
ns:"t_share_log",
cond : { "$and":[{"gmtCreate":{"$gt":new ISODate("2016-10-01T00:00:00.000Z")}}, {"gmtCreate":{"$lt":new ISODate("2016-12-31T23:59:59.999Z"
$keyf:function(doc){
var myDate = new Create);
var mm = '0'+(Month()+1);
var dd = '0'+Date();
return {FullYear()+''+mm.substring(mm.length-2)+''+dd.substring(dd.length-2)};
},
initial:{"shareCount" : 0 , "shareUserCount" : 0 , "userIdMap" : {}},
$reduce:function(doc, prev){
if(doc.userId != null){
prev.shareCount ++;
if(prev.userIdMap[doc.userId] == null) {
prev.shareUserCount ++;
prev.userIdMap[doc.userId] = 1;
}
}
},
finalize: function(doc){delete doc.userIdMap; }
}
});
运⾏以上语句,得到如下结果(正确⽆误,是不是很有意思呢):
{
"retval" : [
{
"day" : "20161129",
"shareCount" : 12,
"shareUserCount" : 4
},
{
"day" : "20161130",
"shareCount" : 21,
"shareUserCount" : 9
}
]
,
"count" : NumberLong(174),
"keys" : NumberLong(8),
"ok" : 1
}
3.最后我们⽤java来实现,代码如下:
private GroupByResults<Map> queryShareStatistics(String beginTime, String endTime)
{
Criteria criteria = null;
if(!StringUtils.isEmpty(beginTime)) {
try
{
criteria = Criteria.where("gmtCreate").gte(Utils.SDF_FULLTIME_FORMAT.parse(beginTime));
}
catch (ParseException e)
{
<(e);
}
}
if(!StringUtils.isEmpty(endTime)) {
try{
if(criteria == null) {
criteria = Criteria.where("gmtCreate").lte(Utils.SDF_FULLTIME_FORMAT.parse(endTime));
}else {
criteria.lte(Utils.SDF_FULLTIME_FORMAT.parse(endTime));
}
}
catch (ParseException e)
{
<(e);
}
}
if(criteria == null) {
criteria = new Criteria();
}
//按⽇统计分享次数与⼈数
GroupBy groupBy = GroupBy.keyFunction("function(doc){"
+ "var myDate = new Create);"
+ "          var mm = '0'+(Month()+1); "
+ "          var dd = '0'+Date();"
+ "          return {FullYear()+''+mm.substring(mm.length-2)+''+dd.substring(dd.length-2)};"
+ "}");
groupBy.initialDocument("{shareCount:0, shareUserCount:0, userIdMap:{}}")
.reduceFunction("function(doc, prev){ "
+ "if(doc.userId != null){"
+ "  prev.shareCount ++;"
+ "  if(prev.userIdMap[doc.userId] == null) { "
+ "      prev.shareUserCount ++; "
+ "      prev.userIdMap[doc.userId] = 1;"
+ "  }"
+ "}"
+ "}")
.finalizeFunction("function(prev){delete prev.userIdMap}");
GroupByResults<Map> r = up(criteria, "t_share_log", groupBy, Map.class);
return r;
}
综上所述,先通过定义keyf的function格式化group by的⽇期值,再通过定义reduce:function函数借助userIdMap去重userId(相当于dintinct userId)。欢迎⼤家提出更好的办法!

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。