【laravel5】详解laravel的chunk和chunkById函数--688IT编程网

【laravel5】详解laravel的chunk和chunkById函数

1、laravel的chunk和chunkById主要处理⽐较⼤的数据，通过分块来处理。

优缺点：

1）chunk的话，底层原理是通过分页page参数处理，update的时候回存在漏⼀半数据情况（并且MySQL的分页在数据量⼤时，严重影响查询效率）2）因为2，出现了chunkById的优化版本，通过主键priKey（或者其他column，下⾯会讲）来处理数据，完美的避过update和分页效率低下问题

3、参考⽂档：

www.lqwang/13.html

segmentfault/a/1190000015284897

learnku/articles/15655/the-use-of-offset-limit-or-skip-and-take-in-laravel

4、本⽂具体讲解chunkById使⽤和底层运⾏原理，chunk不讲。先上⼲货代码：

<?php

namespace App\Console\Commands;

use Illuminate\Console\Command;

use App\Model\OrderReturnTrace;

use App\Dao\UpPayOrderDao;

use App\Model\OrderPayTrace;

use App\Model\OrderTrace;

class OmsOrderDeliveryQuery extends Command

{

/**

* The name and signature of the console command.

* @var string

protected $signature = 'oms:odquery {order_id?}';

protected $order_id;

/**

* The console command description.

* @var string

protected $description = '待发货单物流查询及更新';

/**

* Create a new command instance.

* @return void

public function __construct()

{

parent::__construct();

}

/**

* Execute the console command.

* @return mixed

public function handle()

{

$this->order_id = $this->argument('order_id') ?? '';

try{

if(!empty($this->order_id)) {

$orderM = OrderTrace::where(['order_id'=>$this->order_id, 'pay_status'=>2, 'order_status'=>1, 'delivery_status'=>1])->first();

if(empty($orderM)) return ;

if(! $res = UpPayOrderDao::omsDeliveryQuery($orderM->order_id) ){

return false;

}

# 更新物流信息

UpPayOrderDao::orderDeliveryRelease($this->order_id, $res);

return true;

} else {

# 执⾏oms物流查询

$echoCsv = function($item){

if(! $res = UpPayOrderDao::omsDeliveryQuery($item->order_id) ){

log_write("定时任务退款单refundId{$item->refund_id}状态查询有误");

return false;

}

# 更新物流信息

UpPayOrderDao::orderDeliveryRelease($item->order_id, $res);

};

# 批量查询字段涉及update，使⽤chunkById($count, $callback, $coulumn, $alias)处理，避免chunk漏数据

$refundAll = OrderTrace::where('pay_status', 2)

->where('order_type', 0)

->where('order_status', 1)

->where('pay_channel', 2)

->where('delivery_status', 1)

->select('id','order_id','');

$refundAll->chunkById(200, function($refundAll) {

foreach ($refundAll as $item) {

$echoCsv($item);

}

}, 'id');

}

}catch(\Exception $e) {

log_write("定时任务refundOrderUnionPayQuery，异常：msg={$e->getMessage()}");

UpPayOrderDao::sendEmail('定时任务refundOrderUnionPayQuery，异常', $e->getMessage());

return false;

}

return ;

}

5、chunkById的底层原理&运⾏：

（或者其他column，下⾯会讲）

参考这篇⽂章，讲得很清楚的：www.lqwang/13.html

Laravel chunk和chunkById的坑

公司中的项⽬在逐渐的向Laravel框架进⾏迁移。在编写定时任务脚本的时候，⽤到了chunk和chunkById的API，记录⼀下踩到的坑。

⼀、前⾔

数据库引擎为innodb。

表结构简述，只列出了本⽂⽤到的字段。

字段类型注释

id int(11) ID

type int(11) 类型

mark_time int(10) 标注时间（时间戳）

索引，也只列出需要的部分。

索引名字段

PRIMARY id

idx_sid_blogdel_marktime type

blog_del

mark_time

Idx_marktime mark_time

⼆、需求

每天凌晨⼀点取出昨天标注type为99的所有数据，进⾏逻辑判断，然后进⾏其他操作。本⽂的重点只在于取数据的阶段。

数据按⽉分表，每个⽉表中的数据为1000w上下。

三、chunk处理数据

代码如下

$this->dao->where('type', 99)->whereBetween('mark_time', [$date, $date+86399])->select(array('mark_time', 'id'))->chunk(1000, function ($rows){

// 业务处理

});

从⼀个⽉中的数据，筛选出type为99，并且标注时间在某天的00:00:00-23:59:59的数据。可以使⽤到mark_time和type的索引。

type为99，⼀天的数据⼤概在15-25w上下的样⼦。使⽤->get()->toArray()内存会直接炸掉。所以使⽤chunk⽅法，每次取出1000条数据。

使⽤chucnk，不会出现内存不够的情况。但是性能较差。粗略估计，从⼀⽉数据中取出最后⼀天的数据，跑完20w数据⼤概需要⼀两分钟。

查看源码，底层的chunk⽅法，是为sql语句添加了限制和偏移量。

select * from `users` asc limit 500 offset 500;

在数据较多的时候，越往后的话效率会越慢，因为Mysql的limit⽅法底层是这样的。

limit 10000，10

是扫描满⾜条件的10010⾏，然后扔掉前⾯的10000⾏，返回最后最后20⾏。在数据较多的时候，性能会⾮常差。

查了下API，对于这种情况Laraverl提供了另⼀个API chunkById。

四、chunkById 原理

使⽤limit和偏移量在处理⼤量的数据会有性能的明显下降。于是chunkById使⽤了id进⾏分页处理。很好理解，代码如下：

select * from `users` where `id` > :last_id order by `id` asc limit 500;

API会⾃动保存最后⼀个ID，然后通过id > :last_id 再加上limit就可以通过主键索引进⾏分页。只取出来需要的⾏数。性能会有明显的提升。

五、chunkById的坑

API显⽰chunk和chunkById的⽤法完全相同。于是把脚本的代码换成了chunkById。

$this->dao->where('type', 99)->whereBetween('mark_time', [$date, $date+86399])->select(array('mark_time', 'id'))->chunkById(1000, function ($rows){

// 业务处理

});

在执⾏脚本的时候，1⽉2号和1⽉1号的数据没有任何问题。执⾏速度快了很多。但是在执⾏12⽉31号的数据的时候，发现脚本⼀直执⾏不完。

在定位后发现是脚本没有进⼊业务处理的部分，也就是sql⼀直没有执⾏完。当时很疑惑，因为刚才执⾏的没问题，为什么执⾏12⽉31号的就出问题了呢。于是查看sql服务器中的执⾏情况。

show full processlist;

发现了问题。上节说了chunkById的底层是通过id进⾏order by，然后limie取出⼀部分⼀部分的数据，也就是我们预想的sql是这样的。1

select * from `tabel` where `type` = 99 and mark_time between :begin_date and :end_date limit 500;

explain出来的情况如下：

select_type type key rows Extra

SIMPLE Range idx_marktime 2370258 Using index condition; Using where

实际上的sql是这样的：

select * from `tabel` where `type` = 99 and mark_time between :begin_date and :end_date order by id limit 500;

实际explain出来的情况是这样的：

select_type type key rows Extra

SIMPLE Index PRIMARY 4379 Using where

chunkById会⾃动添加order by id。innodb⼀定会使⽤主键索引。那么就不会再使⽤mark_time的索引了。导致sql执⾏效率及其缓慢。

六、解决⽅法

再次查看chunkById的源码。

/**

* Chunk the results of a query by comparing numeric IDs.

* @param int $count

* @param callable $callback

column函数的使用* @param string|null $column

* @param string|null $alias

* @return bool

public function chunkById($count, callable $callback, $column = null, $alias = null)

{

$column = is_null($column) ? $this->getModel()->getKeyName() : $column;

$alias = is_null($alias) ? $column : $alias;

$lastId = null;

do {

$clone = clone $this;

// We'll execute the query for the given page and get the results. If there are

// no results we can just break and return from here. When there are results

// we will call the callback with the current chunk of these results here.

$results = $clone->forPageAfterId($count, $lastId, $column)->get();

$countResults = $results->count();

if ($countResults == 0) {

break;

}

// On each chunk result set, we will pass them to the callback and then let the

// developer take care of everything within the callback, which allows us to

// keep the memory low for spinning through large result sets for working.

if ($callback($results) === false) {

return false;

}

$lastId = $results->last()->{$alias};

unset($results);

} while ($countResults == $count);

return true;

}

能看到这个⽅法有四个参数count，callback，column，alias。

默认的column为null，第⼀⾏会进⾏默认赋值。

$column = is_null($column) ? $this->getModel()->getKeyName() : $column;

往下跟:

/**

* Get the primary key for the model.

* @return string

public function getKeyName()

{

return $this->primaryKey;

}

/**

* The primary key for the model.

* @var string

protected $primaryKey = 'id';

能看到默认的column为id。

进⼊forPageAfterId⽅法。

/**

* Constrain the query to the next "page" of results after a given ID.

* @param int $perPage

* @param int|null $lastId

* @param string $column

* @return \Illuminate\Database\Query\Builder|static

public function forPageAfterId($perPage = 15, $lastId = 0, $column = 'id')

{

$this->orders = $this->removeExistingOrdersFor($column);

if (! is_null($lastId)) {

$this->where($column, '>', $lastId);

}

return $this->orderBy($column, 'asc')

->take($perPage);//take取多少条

}

能看到如果lastId不为0则⾃动添加where语句，还会⾃动添加order by column。

看到这⾥就明⽩了。上⽂的chunkById没有添加column参数，所以底层⾃动添加了order by id。⾛了主键索引，没有使⽤上mark_time的索引。导致查询效率⾮常低。

chunkById的源码显⽰了我们可以传递⼀个column字段来让底层使⽤这个字段来order by。

代码修改如下：

$this->dao->where('type', 99)->whereBetween('mark_time', [$date, $date+86399])->select(array('mark_time', 'id'))->chunkById(1000, function ($rows){

// 业务处理

}, 'mark_time');

这样最后执⾏的sql如下：

select * from `tabel` where `type` = 99 and mark_time between :begin_date and :end_date order by mark_time limit 500;

再次执⾏脚本，⼤概执⾏⼀次也就⼗秒作⽤了，性能提升显著。

七、总结

chunk和chunkById的区别就是chunk是单纯的通过偏移量来获取数据，chunkById进⾏了优化，不实⽤偏移量，使⽤id过滤，性能提升巨⼤。在数据量⼤的时候，性能可以差到⼏⼗倍的样⼦。⽽且使⽤chunk在更新的时候，也会遇到数据会被跳过的问题。详见解决Laravel中chunk⽅法分块处理数据的坑

同时chunkById在你没有传递column参数时，会默认添加order by id。可能会遇到索引失效的问题。解决办法就是传递column参数即可。

本⼈感觉chunkById不光是根据Id分块，⽽是可以根据某⼀字段进⾏分块，这个字段是可以指定的。叫chunkById有⼀些误导性，chunkByColumn可能更容易理解。算是⾃⼰提的⼩⼩的建议。

本⽂标题：Laravel chunk和chunkById的坑

本⽂作者：旺阳

本⽂链接：www.lqwang/13.html

发布时间：2020-01-06

688IT编程网

【laravel5】详解laravel的chunk和chunkById函数

发表评论

推荐文章

java正则表达式选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

热门文章

利用正则表达式实现文本数据提取与处理

正则表达式零宽断言详解

文本匹配规则

excel中使用正则

1-31正则表达式

anki之高级筛选

BUAA_OO_2021_第一单元总结

insert语句递增写法

sublime text 3在行前插入递增数字序号的方法

字符串只允许数字和英文的正则

powerbuilder 正则表达式

Shell脚本编写的高级技巧利用正则表达式进行字符串匹配

JAVA正则表达式的三种模式:贪婪,勉强和占有的讨论

go regexp匹配规则

oracle regexp_substr 实现原理

基本的元字符回溯引用和前后查匹配模式

elasticsearch query dsl正则

oracle sql正则表达式

GA-设置目标

仅匹配全角片假名的正则表达式

最新文章

java正则表达式选择题

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

vue数字相加小数点变长-概述说明以及解释

vue validate 正则验证小数长度

标签列表

688IT编程网

【laravel5】详解laravel的chunk和chunkById函数

发表评论

推荐文章

java正则表达式 选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

热门文章

利用正则表达式实现文本数据提取与处理

正则表达式零宽断言详解

文本匹配规则

excel中使用正则

1-31正则表达式

anki之高级筛选

BUAA_OO_2021_第一单元总结

insert语句递增写法

sublime text 3在行前插入递增数字序号的方法

字符串只允许数字和英文的正则

powerbuilder 正则表达式

Shell脚本编写的高级技巧利用正则表达式进行字符串匹配

JAVA正则表达式的三种模式:贪婪,勉强和占有的讨论

go regexp匹配规则

oracle regexp_substr 实现原理

基本的元字符 回溯引用和前后查 匹配模式

elasticsearch query dsl正则

oracle sql正则表达式

GA-设置目标

仅匹配全角片假名的正则表达式

最新文章

java正则表达式 选择题

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

vue数字相加小数点变长-概述说明以及解释

vue validate 正则验证小数长度

标签列表

java正则表达式选择题

非零金额正则表达式

基本的元字符回溯引用和前后查匹配模式

java正则表达式选择题

非零金额正则表达式