mysql根据分数分等级_数分⾯试-SQL篇
前⾔
⽂中部分SQL,可能考虑不全⾯,欢迎⼤家指正
⽂章结构:
1.SQL中的进阶函数-窗⼝函数
2.拼多多⾯试题
case专题-商品订单数据
case专题-活动运营数据
case专题-⽤户⾏为路径分析
case专题-⽤户留存分析
case专题-统计特征(中位数,众数,四分位数)
case专题-GMV周同⽐统计
case专题-连续区间问题
3.猿辅导⾯试题
case专题-学⽣成绩分析
case专题-学⽣做题情况分析
4.Hive⾯试⾼频知识点-⾏转列
⼀.SQL中的进阶函数-窗⼝函数
1.窗⼝函数
query参数和params参数窗⼝函数和普通聚合函数的区别:
①聚合函数是将多条记录聚合为⼀条;窗⼝函数是每条记录都会执⾏,有⼏条记录执⾏完还是⼏条。
②聚合函数也可以⽤于窗⼝函数。
原因就在于窗⼝函数的执⾏顺序(逻辑上的)是在FROM,JOIN,WHERE,GROUP BY,HAVING之后,在ORDER
BY,LIMIT,SELECT DISTINCT之前。它执⾏时GROUP BY的聚合过程已经完成了,所以不会再产⽣数据聚合。
注:窗⼝函数是在where之后执⾏的,所以如果where⼦句需要⽤窗⼝函数作为条件,需要多⼀层查询,在⼦查询外⾯进⾏,例如: select user_id,avg(diff)
from
(
select user_id,lead(log_time)over(partition user_id order by log_time) - log_time as diff
from user_log
)t
where datediff(now(),t.log_time)<=30
group by user_id
2.窗⼝函数的基本⽤法:
函数名 OVER ⼦句
over关键字⽤来指定函数执⾏的窗⼝范围,若后⾯括号中什么都不写,则意味着窗⼝包含满⾜WHERE条件的所有⾏,窗⼝函数基于所有⾏进⾏计算;如果不为空,则⽀持以下4中语法来设置窗⼝。
①window_name:给窗⼝指定⼀个别名。如果SQL中涉及的窗⼝较多,采⽤别名可以看起来更清晰易读
②partition by⼦句:窗⼝按照哪些字段进⾏分组,窗⼝函数在不同的分组上分别执⾏
③order by⼦句:按照哪些字段进⾏排序,窗⼝函数将按照排序后的记录顺序进⾏编号
④frame⼦句:frame是当前分区的⼀个⼦集,⼦句⽤来定义⼦集的规则,通常⽤来作为滑动窗⼝使⽤
3.(⾯试考点)序号函数:row_number(),rank(),dense_rank()的区别
ROW_NUMBER():顺序排序——1、2、3
RANK():并列排序,跳过重复序号——1、1、3
DENSE_RANK():并列排序,不跳过重复序号——1、1、2
4.分布函数:percent_rank(),cume_dist()
percent_rank():
每⾏按照公式(rank-1) / (rows-1)进⾏计算。其中,rank为RANK()函数产⽣的序号,rows为当前窗⼝的记录总⾏数
异步 英文--给窗⼝指定别名:WINDOW w AS (PARTITION BY stu_id ORDER BY score) rows = 5
mysql> SELECT
-> RANK() OVER w AS rk,
-> PERCENT_RANK() OVER w AS prk,
-> stu_id, lesson_id, score
-> FROM t_score
-> WHERE stu_id = 1
-> WINDOW w AS (PARTITION BY stu_id ORDER BY score)
-> ;
+----+------+--------+-----------+-------+
| rk | prk | stu_id | lesson_id | score |
+----+------+--------+-----------+-------+
| 1 | 0 | 1 | L003 | 79 |
| 2 | 0.25 | 1 | L002 | 86 |
| 3 | 0.5 | 1 | L004 | 88 |
| 4 | 0.75 | 1 | L005 | 98 |
| 4 | 0.75 | 1 | L001 | 98 |
+----+------+--------+-----------+-------+
cume_dist():
分组内⼩于、等于当前rank值的⾏数 / 分组内总⾏数 eg:查询⼩于等于当前成绩(score)的⽐例
--cd1:没有分区,则所有数据均为⼀组,总⾏数为8
--cd2:按照lesson_id分成了两组,⾏数各为4
mysql> SELECT stu_id, lesson_id, score,
-> CUME_DIST() OVER (ORDER BY score) AS cd1,
-> CUME_DIST() OVER (PARTITION BY lesson_id ORDER BY score) AS cd2
-> FROM t_score
-> WHERE lesson_id IN ('L001','L002')
-> ;
+--------+-----------+-------+-------+------+
| stu_id | lesson_id | score | cd1 | cd2 |
+--------+-----------+-------+-------+------+
| 2 | L001 | 84 | 0.125 | 0.25 |
| 1 | L001 | 98 | 0.75 | 0.5 |
| 4 | L001 | 99 | 0.875 | 0.75 |
| 3 | L001 | 100 | 1 | 1 |
| 1 | L002 | 86 | 0.25 | 0.25 |
| 4 | L002 | 88 | 0.375 | 0.5 |
| 2 | L002 | 90 | 0.5 | 0.75 |
| 3 | L002 | 91 | 0.625 | 1 |
+--------+-----------+-------+-------+------+
5.前后函数:lag(expr,n),lead(expr,n)⽤途:返回位于当前⾏的前n⾏(LAG(expr,n))或后n⾏(LEAD(expr,n))的expr的值应⽤场景:查询前1名同学的成绩和当前同学成绩的差值
mysql> SELECT stu_id, lesson_id, score, pre_score,
-> score-pre_score AS diff
-> FROM(
-> SELECT stu_id, lesson_id, score,
-> LAG(score,1) OVER w AS pre_scorelinux创建文本文件
-> FROM t_score
-> WHERE lesson_id IN ('L001','L002')mysql面试题导图
-> WINDOW w AS (PARTITION BY lesson_id ORDER BY score)) t
-> ;
+--------+-----------+-------+-----------+------+
| stu_id | lesson_id | score | pre_score | diff |
+--------+-----------+-------+-----------+------+
| 2 | L001 | 84 | NULL | NULL |
| 1 | L001 | 98 | 84 | 14 |
| 4 | L001 | 99 | 98 | 1 |
| 3 | L001 | 100 | 99 | 1 |
| 1 | L002 | 86 | NULL | NULL |
| 4 | L002 | 88 | 86 | 2 |
字符串转数组 csdn| 2 | L002 | 90 | 88 | 2 |
| 3 | L002 | 91 | 90 | 1 |
+--------+-----------+-------+-----------+------+
6.头尾函数:FIRST_VALUE(expr),LAST_VALUE(expr)⽤途:返回第⼀个(FIRST_VALUE(expr))或最后⼀个(LAST_VALUE(expr))expr的值
应⽤场景:截⽌到当前成绩,按照⽇期排序查询第1个和最后1个同学的分数
mysql> SELECT stu_id, lesson_id, score, create_time,
-> FIRST_VALUE(score) OVER w AS first_score,
-> LAST_VALUE(score) OVER w AS last_score
-> FROM t_score
-> WHERE lesson_id IN ('L001','L002')
-> WINDOW w AS (PARTITION BY lesson_id ORDER BY create_time)
-> ;
+--------+-----------+-------+-------------+-------------+------------+
| stu_id | lesson_id | score | create_time | first_score | last_score |
+--------+-----------+-------+-------------+-------------+------------+
| 3 | L001 | 100 | 2018-08-07 | 100 | 100 |
| 1 | L001 | 98 | 2018-08-08 | 100 | 98 |
| 2 | L001 | 84 | 2018-08-09 | 100 | 99 |
| 4 | L001 | 99 | 2018-08-09 | 100 | 99 |
| 3 | L002 | 91 | 2018-08-07 | 91 | 91 |
| 1 | L002 | 86 | 2018-08-08 | 91 | 86 |
| 2 | L002 | 90 | 2018-08-09 | 91 | 90 |
| 4 | L002 | 88 | 2018-08-10 | 91 | 88 |
+--------+-----------+-------+-------------+-------------+------------+
拼多多⾯试题
注:部分来源于笔试,⾯试部分因为都是基于其他⼈⾯经⼝述记录,所以难免有⼀些在格式不统⼀的地⽅。
⼆.case专题-商品订单数据
数据表:
订单表orders,⼤概字段有(order_id'订单号,'user_id‘⽤户编号’, order_pay‘订单⾦额’ , order_time‘下单时间’,'商品⼀级类⽬commodity_level_I','商品⼆级类⽬commodity_level_2')
1.求最近7天内每⼀个⼀级类⽬下成交总额排名前3的⼆级类⽬:
select commodity_level_1 as '商品⼀级类⽬',
commodity_level_2 as '商品⼆级类⽬',
total_pay as '成交总额'帮boss 表单大师
from
(
select commodity_level_1,commodity_level_2,total_pay
row_number()over(partition by commodity_level_1 order al_pay desc) as rank
from
(
select commodity_level_1,
commodity_level_2,
sum(order_pay) as total_pay
from orders
where datediff(now(),order_time) <= 7
group by commodity_level_1,commodity_level_2,
)
a
) b
where rank <= 3
2.提取8.1-8.10每⼀天消费⾦额排名在101-195的user_id
select order_date as '订单⽇期',
user_id,
total_pay as '消费⾦额'
from
(
select order_date,user_id,total_pay
row_number()over(partition by order_date order al_pay desc) as rank
from
(
select convert(order_time,date) as order_date,
user_id,
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论