clickhouse与mysql查询速度对⽐
数据准备⼯作:
MySQL创建准备对⽐的表Bigtable并建⽴索引
CREATE TABLE Bigtable (
consumption_id VARCHAR(255),
member_id INT(11),
fans_id INT(11),
bill_date DATE,
money VARCHAR(255),
people_num INT(8),
dish_name VARCHAR(255),
created_org INT(8),
open_id VARCHAR(255),
subscribed_time DATETIME,
unsubscribed_time DATETIME,
source_type INT(8),
sns_type VARCHAR(255),
is_subscribed VARCHAR(255)
)ENGINE=INNODB DEFAULT CHARSET=utf8;
CREATE INDEX indexbigtable ON
bigtable(consumption_id,member_id,bill_date,created_org,source_type,sns_type,is_subscribed);
将数据插⼊Bigtable,共308万⾏。
INSERT INTO bigtable sumption_ber_id,b.fans_id,DATE(a.bill_date),a.need_pay_amount,a.people_num,a.dish_ated_o rg,b.open_id,b.subscribed_time,
b.unsubscribed_time,b.source_type,b.sns_type,b.is_subscribed
FROM (
sumption_ber_id,dish.dish_name,conn.bill_date,conn.people_d_pay_ated_org
FROM `pos_consumption` AS conn
INNER JOIN pos_dining_order AS porder sumption_id = sumption_id ated_org = ated_org
INNER JOIN pos_order_dish AS dish der_id=dish.dining_order_id ated_ated_org
WHERE conn.`status_code`=1
AND conn.`need_pay_amount` >0
) a
LEFT JOIN (SELECT fan.fans_ber_id,bind.open_id,bind.subscribed_time,bind.unsubscribed_time,fan.source_type,sns_type,is_subscribed FR OM rv_ol_fans AS fan
INNER JOIN `0001790455_baseinfo`.`ol_fans_sns_bind` AS bind USE INDEX (idx_fans_id) ON fan.fans_id=bind.fans_id WHERE bind.status_code=1)b O ber_ber_id
clickhouse建表bigtable,⼤⼩写⼀定要严格符合:
CREATE TABLE bigtable (\
consumption_id UInt64,\
member_id UInt64,\
fans_id UInt64,\
bill_date Date,\
money Float32,\
people_num UInt8,\
dish_name String,\
created_org UInt8,\
open_id String,\
subscribed_time DateTime,\
unsubscribed_time DateTime,\
source_type UInt8,\
sns_type UInt8,\
is_subscribed UInt8\
)
ENGINE=MergeTree(bill_date,(consumption_id,created_org),8192)
mysql导出数据到csv:
SELECT consumption_id,
member_id, fans_id, bill_date,
money, people_num,
dish_name, created_org,open_id,subscribed_time,unsubscribed_time,
source_type,sns_type,is_subscribed
INTO OUTFILE '/var/lib/mysql-files/bigtable.csv'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n' FROM
bigtable;
服务器之间的csv拷贝:
scp bigtable.csv root@:/root/clickhouse-files
clickhouse导⼊csv:
cat /root/clickhouse-files/bigtable.csv | clickhouse-client --query="INSERT INTO test.bigtable FORMAT CSV"
数据量为308万⾏,每⾏14个列的测试:
select * from bigtable
clickhouse:67.62s ; mysql:0.002s
clickhouse感觉是瞬间查出来,但是加载10000⾏数据刷屏⽤了很久很久,clickhouse⾥查询的列数⼀多,加载的时间就变长TOPN
select* from bigtable order by dish_Name limit 1
clickhouse:0.13s ; mysql:0.020s
统计数据有多少⾏:
select count(1) from bigtable
clickhouse:0.015s ; mysql:1.33s
统计⼀共有多少个订单:
select count(1) from (select consumption_id,created_org from bigtable group by consumption_id,created_org)a;
clickhouse:0.121s ; mysql:5.15s
来吃过的次数⾥各有多少⼈:82⾏
select sum_all,count(1) as countall from (select member_id,count(1) as sum_all from
(select member_id,consumption_id,created_org from bigtable group by consumption_id,created_org) a group by member_id) a
group by sum_all;
clickhouse:0.166s ; mysql:5.50s
添加条件的查询:
SELECT COUNT(1) FROM bigtable
WHERE member_id IN (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,
20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,
40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,
60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,
76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,
95,96,97,98,99,100)
AND bill_date BETWEEN '2017-01-01' AND '2017-12-31'
AND fans_id >10
clickhouse:0.027s ; mysql:2.05s
每个⽤户最常吃的菜品:8万⾏
SELECT member_id,dish_name,sum_all FROM (
SELECT member_id,dish_name,COUNT(1) AS sum_all FROM bigtable WHERE dish_name NOT IN('⽶饭','打包盒','茶位费')
GROUP BY member_id,dish_name ORDER BY member_id,sum_all DESC) a GROUP BY member_id
clickhouse:11.90s ; mysql:9.88s
来个猛的,四重⼦查询:171万⾏
SELECT * FROM bigtable
WHERE member_id IN(SELECT member_id FROM bigtable WHERE member_id IN(SELECT member_id FROM bigtable WHERE source_type=1 AND me mber_id IN (SELECT member_id FROM bigtable WHERE source_type=1
AND member_id IN (SELECT member_id FROM bigtable WHERE fans_id !=0))
) AND is_subscribed=1 )
clickhouse:65.284s ; mysql:
Mysql不⾏了,查了30分钟还没结果;clickhouse同样是加载⾏数⽤了很久
再来⼀个⼆重⼦查询试试:
SELECT * FROM bigtable\
WHERE member_id IN(SELECT member_id FROM bigtable WHERE member_id IN(SELECT member_id FROM bigtable WHERE source_type=1 )\
AND is_subscribed=1 )
clickhouse:63.216s ; mysql:
没想到两个⼦查询的mysql在30分钟内也出不来结果
那么这部分对⽐查询时间的结果显⽽易见了:
对于简单查询来说,查询列数多的话mysql有优势,查询列数少的话clickhouse领先。
对于复杂查询来说,clickhouse占有显著优势
另外,展⽰⾏数的多少会影响clickhouse的查询时间,不知道是不是因为使⽤linux的原因
数据量为1亿两千四百万⾏,每⾏62个列的测试:
同样,先是select⼀下全表
select * from pdish_test
clickhouse:276s ; mysql:0.036s
统计⾏数:
select count(1) from pdish_test
clickhouse:0.044s ; mysql:32.168s
每个菜品点了多少次,共收⼊多少钱:377⾏
SELECT dish_name,COUNT(1) AS cc,SUM(dish_sale_amount) AS c FROM pdish_test GROUP BY dish_name ORDER BY cc DESC clickhouse:3.55s ; mysql:8min6s
简单的条件查询:
SELECT COUNT(1) FROM pdish_test WHERE dish_sale_amount>10 AND created_on >'2017-01-01'
clickhouse:0.448s ; mysql:
依然是30分钟过去了,mysql还是没有反应
看来对⾮常⼤的数据量来说,Mysql已经跑不动了
关联表查询,三个表分别为近100万,13万,13万:
三个表相互关联拼接的查询: 8万⾏
clickhouse:3.65s ; mysql:4min46s
在⼏万⼗⼏万⾏的数据⾥,clickhouse的速度也是要显著快于mysql
>join和in哪个查询更快
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论