MYSQL经典SQL之时长统计
概述
假设我们有⼀张表,记录了⽤户的登⼊、登出信息,我们需要根据这些信息统计⽤户的在线时长,相应的设置⽤户等级,就像挂QQ升级
表数据如下:
mysql> select * from user_sessions;
+----+----------+-------+----------+----------+
| id | platform | usr | start | end |
+----+----------+-------+----------+----------+
| 1| web | user1 |08:30:00 | 10:30:00 |
| 2| web | user2 |08:30:00 | 08:45:00 |
| 3| web | user1 |09:00:00 | 09:30:00 |
| 4| web | user2 |09:15:00 | 10:30:00 |
| 5| web | user1 |09:15:00 | 09:30:00 |
| 6| web | user2 |10:30:00 | 14:30:00 |
| 7| web | user1 |10:45:00 | 11:30:00 |
| 8| web | user2 |11:00:00 | 12:30:00 |
| 9| app | user1 |08:30:00 | 08:45:00 |
| 10| app | user2 |09:00:00 | 09:30:00 |
| 11| app | user1 |11:45:00 | 12:00:00 |
| 12| app | user2 |12:30:00 | 14:00:00 |
| 13| app | user1 |12:45:00 | 13:30:00 |
| 14| app | user2 |13:00:00 | 14:00:00 |
| 15| app | user1 |14:00:00 | 16:30:00 |
| 16| app | user2 |15:30:00 | 17:00:00 |
+----+----------+-------+----------+----------+
我们的站点有两个⼊⼝:web端和app端
很明显,这些数据时间上是有重叠的,例如user1在08:30~10:30这个时间段内⽤三个web端和⼀个app端登录,但他的时长应该只能算作2个⼩时
要正确统计,我们需要三步:
1. 获取⽤户的重叠时间段中最早的起始时间
2. 获取⽤户的重叠时间段中最晚的结束时间
3. 前两步的起始时间与结束时间合并,得到综合的时间段,再进⾏统计
第⼀步:
查询⽤户名、起始时间
其中起始时间需要满⾜的条件是:不包含于其它时间段,即不存在⼀条记录来满⾜条件“s>start且s<=end”
得到的结果可能存在重复,所以⽤distinct去重
这样到的s就是⽤户重叠时间段的最早时间。如下:
mysql> select distinct usr,start from user_sessions as a where not exists(select * from user_sessions as b where a.usr=b.usr and a.start>b.start an +-------+----------+
| usr | start |
+-------+----------+
| user1 | 08:30:00 |
| user2 | 08:30:00 |
| user1 | 10:45:00 |
| user2 | 09:00:00 |
| user1 | 11:45:00 |
| user1 | 12:45:00 |
| user1 | 14:00:00 |mysql下载app
| user2 | 15:30:00 |
+-------+----------+
第⼆步:
查询⽤户名、结束时间
这步于第⼀步⼀样道理,结束时间满⾜的条件是:不包含于其它时间段,即不存在⼀条记录满⾜“e>=start且e
mysql> select distinct usr,end from user_sessions as a where not exists(select * from user_sessions as b where a.usr=b.usr d>=b.start an +-------+----------+
| usr | end |
+-------+----------+
| user1 | 10:30:00 |
| user2 | 08:45:00 |
| user2 | 14:30:00 |
| user1 | 11:30:00 |
| user1 | 12:00:00 |
| user1 | 13:30:00 |
| user1 | 16:30:00 |
| user2 | 17:00:00 |
+-------+----------+
第三步:
合并前两步的结果
为了⽅便查询和叙述,我们将前两步建⽴对应的视图:
mysql> create view v_s as select distinct usr,start from user_sessions as a where not exists(select * from user_sessions as b where a.usr=b.usr and mysql> create view v_e as select distinct usr,end from user_sessions as a where not exists(select * from user_sessions as b where a.usr=b.usr and 查看v_s和v_e两个视图:
mysql> select * from v_s;
+-------+----------+
| usr | start |
+-------+----------+
| user1 | 08:30:00 |
| user2 | 08:30:00 |
| user1 | 10:45:00 |
| user2 | 09:00:00 |
| user1 | 11:45:00 |
| user1 | 12:45:00 |
| user1 | 14:00:00 |
| user2 | 15:30:00 |
+-------+----------+
mysql> select * from v_e;
+-------+----------+
| usr | end |
+-------+----------+
| user1 | 10:30:00 |
| user2 | 08:45:00 |
| user2 | 14:30:00 |
| user1 | 11:30:00 |
| user1 | 12:00:00 |
| user1 | 13:30:00 |
| user1 | 16:30:00 |
| user2 | 17:00:00 |
+-------+----------+
可以看到,两个视图总⾏数是相等的,但并不是按⾏对应的,所以并不是单纯的将两个结果横向合并!
⽅案⼀:
对于v_s中的起始时间,我们应该在v_e中“usr相同且end>start”的最⼩值
例如,对于v_s中user1起始时间为08:30:00,我们应该到v_e中end>08:30:00且usr=user1的记录,然后取min得到结果为
10:30:00作为与之对应的结束时间
SQL及结果如下:
mysql> select distinct v_s.usr,v_s.start,(select min(end) from v_e where d>v_s.start and v_e.usr=v_s.usr) as end from v_s,v_e where v_s.us
+-------+----------+----------+
| usr | start | end |
+-------+----------+----------+
| user1 | 08:30:00 | 10:30:00 |
| user2 | 08:30:00 | 08:45:00 |
| user1 | 10:45:00 | 11:30:00 |
| user2 | 09:00:00 | 14:30:00 |
| user1 | 11:45:00 | 12:00:00 |
| user1 | 12:45:00 | 13:30:00 |
| user1 | 14:00:00 | 16:30:00 |
| user2 | 15:30:00 | 17:00:00 |
+-------+----------+----------+
在此基础上,按usr进⾏分组,统计⽤户的总时长,得到最终结果:
mysql> select usr,sec_to_time(sum(timestampdiff(second,start,end))) as time from (select distinct v_s.usr,v_s.start,(select min(end) from v_e wher
+-------+----------+
| usr | time |
+-------+----------+
| user1 | 06:15:00 |
| user2 | 07:15:00 |
+-------+----------+
其中,timestampdiff函数是计算两个时间的差值,第⼀个参数⽤来指定其单位,这⾥指定为秒;
sec_to_time函数是将求和后的秒数再转为时间格式来显⽰
⽅案⼆
既然v_s与v_e的数据不是按⾏对应的,那么我们可以对其排序,使其按⾏对应后,就可以直接按⾏合并
为了按⾏合并,我们同时加上⾏号,让两个⼦查询⾏号匹配即可:
mysql> set @sno=0;
mysql> set @eno=0;
mysql> select s.usr,s.d from (select @sno:=@sno+1 as sno,usr,start from v_s order by usr,start) as s, (select @eno:=@eno+1 as eno,usr,end from +-------+----------+----------+
| usr | start | end |
+-------+----------+----------+
| user1 | 08:30:00 | 10:30:00 |
| user1 | 10:45:00 | 11:30:00 |
| user1 | 11:45:00 | 12:00:00 |
| user1 | 12:45:00 | 13:30:00 |
| user1 | 14:00:00 | 16:30:00 |
| user2 | 08:30:00 | 08:45:00 |
| user2 | 09:00:00 | 14:30:00 |
| user2 | 15:30:00 | 17:00:00 |
+-------+----------+----------+
之后的分组统计与⽅案⼀相同
扩展
实际场景中,我们经常将不同的⽅式的在线时长加上奖励倍数,来⿎励⽤户更多的使⽤某种⼊⼝。
例如,我们为了让⽤户更多的使⽤app登录,规定app的在线时长奖励2倍,即app在线⼀⼩时相当于web在线两⼩时。
这时,我们就需要将不同的登录⽅式区分开来统计。要区分开也很简单,只需要添加SQL条件让platform相同即可,例如第⼀步修改为:
mysql> select distinct usr,platform,start from user_sessions as a where not exists(select * from user_sessions as b where a.usr=b.usr and a.platfor +-------+----------+----------+
| usr | platform | start |
+-------+----------+----------+
| user1 | web |08:30:00 |
| user2 | web |08:30:00 |
| user2 | web |09:15:00 |
| user1 | web |10:45:00 |
| user1 | app |08:30:00 |
| user2 | app |09:00:00 |
| user1 | app |11:45:00 |
| user2 | app |12:30:00 |
| user1 | app |12:45:00 |
| user1 | app |14:00:00 |
| user2 | app |15:30:00 |
+-------+----------+----------+
之后的步骤类似,在最终统计时分别统计web和app的时长,并将app的时长乘以2再与web时长相加即可,不再赘述。
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论