机器学习模型保存pickle、joblib、pmml等三种⽅式的优缺点
机器学习模型保存pickle、joblib、pmml等三种⽅式的优缺点
joblib
sklearn中提供了⾼效的模型持久化模块joblib,将模型保存⾄硬盘。⽂件类型为⼆进制
优点是效率很⾼(·透明的磁盘缓存功能和懒惰的重新评估(memoize模式)
·简单的并⾏计算),读取速度也相对pickle快。
from sklearn2pmml import PMMLPipeline, sklearn2pmml
als import joblib
import pickle
from sklearn.datasets import load_iris
from sklearn import tree
iris = load_iris()
clf = tree.DecisionTreeClassifier()
pipeline = PMMLPipeline([("classifier", clf)])
import picklepipeline.fit(iris.data, iris.target)
joblib.dump(pipeline,'20200607_decisiontree.pkl')
j1 = joblib.load('20200607_decisiontree.pkl')
pickle
pickle有两种⽅式: pickle.dumps 是将模型保存为string类型 with , ‘wb’) as f: pickle.dump(模型⽂件, f) 是将模型写⼊到打开的⽂件中
a1 = pickle.dumps(pipeline)
a2 = pickle.loads(a1)
a2
输出
PMMLPipeline(steps=[('classifier', DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
max_features=None, max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, presort=False, random_state=None,
splitter='best'))])
with open('./','wb')as f:
pickle.dump(pipeline, f)
with open('./','rb')as f:
a3 = pickle.load(f)
assert(a3 == a2)
输出
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-7-09ffb1282370> in <module>
----> 1 assert(j1 == a2)
AssertionError:
代表pickle反序列化出来的⽂件跟joblib.load出来的⽂件并不相同
pmml
clf = tree.DecisionTreeClassifier()
pipeline = PMMLPipeline([("classifier", clf)])
pipeline.fit(iris.data, iris.target)
sklearn2pmml(pipeline,"DecisionTreeIris.pmml", with_repr =True)
效率
joblib最⾼,pickle以及pmml⽂件其次(具体谁第⼆谁第三没有进⾏测试过)跨平台部署选择pmml
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论