python+Treelite:Sklearn树模型训练迁移到c、java部署受本篇启发:
Treelite:树模型部署加速⼯具(⽀持XGBoost、LightGBM和Sklearn)
Coggle,:Coggle数据科学
⽀持模型:XGB、LGB、SKlearn树模型
还有⼀个特性:在树模型运⾏的每台计算机上安装机器学习包(例如 XGBoost、LightGBM、scikit-learning 等)⾮常⿇烦。
这种情况不再如此:Treelite 将导出模型作为独⽴预测库,以便⽆需安装任何机器学习包即可进⾏预测。
1 安装
python3 -m pip install --user treelite treelite_runtime
2 Treelite介绍与原理
Treelite能够树模型编译优化为单独库,可以很⽅便的⽤于模型部署。经过优化后可以将XGBoost模型的预测速度提⾼2-6倍。
java调用python模型
如上图,⿊⾊曲线为XGBoost在不同batch size下的吞吐量,红⾊曲线为XGBoost经过TreeLite编译后的吞吐量。
Treelite⽀持众多的树模型,特别是随机森林和GBDT。同时Treelite可以很好的⽀持XGBoost, LightGBM和 scikit-learn,也可以将⾃定义模型根据要求完成编译。
2.1 逻辑分⽀
对于树模型⽽⾔,节点的分类本质使⽤if语句完成,⽽CPU在执⾏if语句时会等待条件逻辑的计算。
if ( [conditional expression] ) {
foo();
} else {
bar();
}
如果在构建树模型时候,提前计算好每个分⽀下⾯样本的个数,则可以提前预知哪⼀个叶⼦节点被执⾏的可能性更⼤,进⽽可以提前执⾏⼦节点逻辑。
借助于编译命令,可以完成逻辑计算加速。
/* expected to be false */
if( __builtin_expect([condition],0)){
...
} else {
...
}
2.2 逻辑⽐较
原始的分⽀⽐较可能会有浮点数⽐较逻辑,可以量化为数值⽐较逻辑。
if (data[3].fvalue < 1.5) {
/* floating-point comparison */
.
..
}
if (data[3].qvalue < 3) {
/* integer comparison */
...
}
3 快速⼊门将树组合模型导⼊树精简:
import treelite
model = treelite.Model.load('del', model_format='xgboost')
部署源存档:
# Produce a zipped source directory, containing all model information
# Run `make` on the target machine
pkgpath='./mymodel.zip', libname='mymodel.so',
verbose=True)
部署共享库:
# Like export_srcpkg, but generates a shared library immediately
# Use this only when the host and target machines are compatible
对⽬标机器进⾏预测:
import treelite_runtime
predictor = treelite_runtime.Predictor('./mymodel.so', verbose=True)
batch = treelite_runtime.Batch.from_npy2d(X)
out_pred = predictor.predict(batch)
4 快速load⼏类数据模型:XGB、LGB、SKlearn
4.1 XGB
从xgboost.Booster加载XGBoost模型
# bst = an object of type xgboost.Booster
model = Model.from_xgboost(bst)
从binary ⼆进制格式加载XGBoost模型
# model had been saved to a file named del
# notice the second argument model_format='xgboost'
model = Model.load('del', model_format='xgboost')
4.2 LGB
Microsoft/LightGBM的LightGBM 可以使⽤load(),可以指定参数:model_format='lightgbm'
# model had been saved to a file named
# notice the second argument model_format='lightgbm'
model = Model.load('', model_format='lightgbm')
4.3 scikit-learn模型
可以加载以下⼏种:
# clf is the model object generated by scikit-learn
import treelite.sklearn
model = treelite.sklearn.import_model(clf)
5 java版本:Treelite4J
Treelite4J 是Java使⽤的依赖,在本地⽂件系统中到编译的模型(dll / so / dylib)。我们通过创建Predictor对象来加载已编译的模型:
import lite4j.Predictor;
Predictor predictor = new Predictor("path/to/compiled_model.so", -1, true, true);
加载编译的模型后,我们可以对其进⾏查询:
// Get the input dimension, i.e. the number of feature values in the input vector
int num_feature = predictor.GetNumFeature();
// Get the number of classes.
// This number is 1 for tasks other than multi-class classification.
// For multi-class classification task, the number is equal to the number of classes.
int num_class = predictor.GetNumClass();
为了使⽤单个输⼊进⾏预测,我们创建了⼀个Entry对象数组,设置了它们的值,并调⽤了预测函数。
// Create an array of feature values for the input
int num_feature = predictor.GetNumFeature();
Entry[] inst = new Entry[num_feature];
// Initialize all feature values as missing
for (int i = 0; i < num_feature; ++i) {
inst[i] = new Entry();
inst[i].setMissing();
}
// Set feature values that are not missing
// In this example, we set feature 1, 3, and 7
inst[1].setFValue(-0.5);
inst[3].setFValue(3.2);
inst[7].setFValue(-1.7);
// Now run prediction
// (Put false in the second argument to get probability outputs)
float[] result = predictor.predict(inst, false);
// The result is either class probabilities (for multi-class classification)
// or a single number (for all other tasks, such as regression)

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。