python模型转PMML
关于python模型的部署,⽬前有以下⼏种⽅式
flask等python为服务框架,⽆需跨语⾔
xgb4j,lgb4j等Java包,需跨语⾔,但只⽀持xgb/lgb
PMML,跨语⾔,⽀持所有sklearn接⼝的模型
综上所述,当遇到跨语⾔部署时,PMML是个万⾦油⽅式,可以将所有sklearn接⼝的模型转换为PMML⽂件,并⽤JAVA/SCALA相关的包进⾏解析,然⽽经过⼀番调研,⽹上关于python如何转为PMML的信息却极为有限,故在此总结。
1、DataFrameMapper
⽬前DataFrameMapper⽀持sklearn.preprocessing中的若⼲类,如MinMaxScaler()、OneHotEncoder()等
DataFrameMapper⽀持⾃定义函数,可使⽤FunctionTransformer(),将⾃定义函数转换为类似MinMaxScaler()类的格式DataFrameMapper⽀持单列或多列级联变换
sklearn.preprocessing中的函数输⼊为numpy.ndarray
mapper = DataFrameMapper([
(["Sepal.Length"],FunctionTransformer(np.abs)),
(["Sepal.Width"],[MinMaxScaler(), Imputer()]),
(["Petal.Length"],None),
(["Petal.Width"],OneHotEncoder()),
(['Petal.Length','Petal.Width'],[MinMaxScaler(),StandardScaler()])
])
2、PMMLPipeline
PMMLPipeline中⽀持整体变换类,如PCA、SelectKBest、GBDT等,只要符合sklearn接⼝格式,具有fit transform即可理论上⽀持符合规则的⾃定义函数
iris_pipeline = PMMLPipeline([
("mapper", mapper),
("pca", PCA(n_components=3)),
("selector", SelectKBest(k=2)),#返回k个最佳特征
("classifier", GBDT)])
iris_pipeline.fit(df_x, y)
3、sklearn2pmml
保存为PMML⽂件
sklearn2pmml(iris_pipeline, savemodel, with_repr=True)
其他注意事项
由于DataFrameMapper对特征⼯程⽀持有限,特征⼯程可以线上线下分开单独做,也可以⽤ DataFrameMapper 的⽅式实现特征⼯程,导出到模型⽂件中,这样线上就⽆需再做⼀遍特征⼯程
完整代码
"""
⽂件说明:鸢尾花数据集
"""
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
semble import GradientBoostingClassifier
from sklearn2pmml import sklearn2pmml, PMMLPipeline
from sklearn2pmml.decoration import ContinuousDomain
from sklearn2pmml.decoration import ContinuousDomain
from sklearn.feature_selection import SelectKBest
# frameworks for ML
from sklearn_pandas import DataFrameMapper
from sklearn.pipeline import make_pipeline
# transformers for category variables
from sklearn.preprocessing import LabelBinarizer
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import Imputer
# transformers for numerical variables
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import Normalizer
# transformers for combined variables
from sklearn.decomposition import PCA
from sklearn.preprocessing import PolynomialFeatures
# user-defined transformers
from sklearn.preprocessing import FunctionTransformer
java调用python模型def read_data():
#读取鸢尾花数据
data=load_iris()
x=data.data
y=data.target
df_x = pd.DataFrame(x)
lumns =["Sepal.Length","Sepal.Width","Petal.Length","Petal.Width"]
return df_x,y
def all_classifiers_test(savemodel='GBDT.pmml'):
'''
GBDT模型
'''
GBDT = GradientBoostingClassifier()
df_x,y = read_data()
# 特征⼯程
mapper = DataFrameMapper([
(["Sepal.Length"],FunctionTransformer(np.abs)),
(["Sepal.Width"],[MinMaxScaler(), Imputer()]),
(["Petal.Length"],None),
(["Petal.Width"],OneHotEncoder()),
(['Petal.Length','Petal.Width'],[MinMaxScaler(),StandardScaler()])
])
# mapper = DataFrameMapper([
#        (["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"], [MinMaxScaler(),StandardScaler(),Imputer()]) # ])
iris_pipeline = PMMLPipeline([
("mapper", mapper),
("pca", PCA(n_components=3)),
("selector", SelectKBest(k=2)),#返回k个最佳特征
("classifier", GBDT)])
iris_pipeline.fit(df_x, y)
# iris_pipeline.fit(X_train.values, y_train)
# 导出模型⽂件
sklearn2pmml(iris_pipeline, savemodel, with_repr=True)

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。