Easy Hyperparameter Search Using Optunity
Marc Claesen
**************************.be Jaak Simm
***********************.be Dusan Popovic
***************************.be Yves Moreau
*************************.be Bart De Moor *************************.be
KU Leuven,Department of Electrical Engineering (ESAT)STADIUS Center for Dynamical Systems,Signal Processing and Data Analytics iMinds,Department of Medical Information Technologies Kasteelpark Arenberg 10,box 24463001Leuven,Belgium Abstract Optunity is a free software package dedicated to hyperparameter optimization.It contains various types of solvers,ranging from undirected methods to direct search,particle swarm and evolutionary optimization.The design focuses on ease of use,flexibility,code clarity and interoperability with existing software in all machine learning environments.Optunity is written in Python and contains interfaces to environments such as R and MATLAB.Optunity uses a BSD license and is freely availab
le online at www.optunity .Keywords:hyperparameter search,black-box optimization,algorithm tuning,Python 1.Introduction Many machine learning tasks aim to train a model M which minimizes some loss function L (M |X (te ))on given test data X (te ).A model is obtained via a learning algorithm A which uses a training set X (tr )and solves some optimization problem.The learning algorithm A may itself be parameterized by a set of hyperparameters λ,e.g.M =A (X (tr )|λ).Hyperparameter search –also known as tuning –aims to find a set of hyperparameters λ∗,such that the learning algorithm yields an optimal model M ∗that minimizes L (M |X (te )):
λ∗=arg min λL A (X (tr )|λ)|X (te ) =arg min λF (λ|A ,X (tr ),X (te ),L )
(1)
In the context of tuning,F is the objective function and λis a tuple of hyperparameters (optimization variables).The learning algorithm A and data sets X (tr )and X (te )are known.Depending on the learning task,X (tr )and X (te )may be labeled and/or equal to each other.The objective function often has a constrained domain (for example regularization terms must be positive)and is assumed to be expensive to evaluate,black-box and non-smooth.Tuning hyperparameters is a recurrent task in many machine learning approaches.Some common hyperparameters that must be tuned are related
to kernels,regularization,learning rates and network architecture.Tuning can be necessary in both supervised and unsuper-vised settings and may significantly impact the resulting model’s performance.a r X i v :1412.1114v 1 [c s .L G ] 2 D e c 2014
Claesen,Simm,Popovic,Moreau and De Moor
General machine learning packages typically provide only basic tuning methods like grid search.The most common tuning approaches are grid search and manual tuning(Hsu et al., 2003;Hinton,2012).Grid search suffers from the curse of dimensionality when the number of hyperparameters grows large while manual tuning requires considerable expertise which leads to poor reproducibility,particularly when many hyperparameters are involved.
2.Optunity
Our software is a Swiss army knife for hyperparameter search.Optunity offers a series of configurable optimization methods and utility functions that enable efficient hyperparame-ter optimization.Only a handful of lines of code are necessary to perform tuning.Optunity should be used in tandem with existing machine learning packages that implement learning algorithms.The package uses a BSD license and is simple to deploy in any environment. Optunity has been tested in Python,
R and MATLAB on Linux,OSX and Windows.
2.1Functional overview
Optunity provides both simple routines for lay users and expert routines that enablefine-grained control of various aspects of the solving process.Basic tuning can be performed with minimal configuration,requiring only an objective function,an upper limit on the number of evaluations and box constraints on the hyperparameters to be optimized.
The objective function must be defined by the user.It takes a hyperparameter tupleλand typically involves three steps:(i)training a model M withλ,(ii)use M to predict a test set(iii)compute some score or loss based on the predictions.In unsupervised tasks, the separation between(i)and(ii)need not exist,for example in clustering a data set.
Tuning involves a series of function evaluations until convergence or until a predefined maximum number of evaluations is reached.Optunity is capable of vectorizing evaluations in the working environment to speed up the process at the end user’s volition.
Optunity additionally provides k-fold cross-validation to estimate the generalization performance of s
upervised modeling approaches.The cross-validation implementation can account for strata and clusters.1Finally,a variety of common quality metrics is available.
The code example below illustrates tuning an SVM with scikit-learn and Optunity.2
ss_validated(x=data,y=labels,num_folds=10,num_iter=2)
2def svm auc(x_train,y_train,x_test,y_test,C,gamma):
3model=sklearn.svm.SVC(C=C,gamma=gamma).fit(x_train,y_train)
4decision_values=model.decision_function(x_test)
_auc(y_test,decision_values)
6
7optimal_pars,_,_=optunity.maximize(svm auc,num_evals=100,C=[0,10],gamma=[0,1])
8optimal_model=sklearn.svm.SVC(**optimal_pars).fit(data,labels)
The objective function as per Equation(1)is defined on lines1to5,whereλ=(C,γ), A is the SVM training algorithm and L is area under the ROC curve.We use2×iterated 10-fold cross-validation to estimate area under the ROC curve.Up to100hyperparameter tuples are tested within the box constraints0<C<10and0<γ<1on line7.
1.Instances in a stratum should be spread across folds.Clustered instances must remain in a single fold.
2.We assume the correct imports are made and data and labels contain appropriate content.
Optunity
2.2Available solvers
Optunity provides a wide variety of solvers,ranging from basic,undirected methods like grid search and
random search (Bergstra and Bengio,2012)to evolutionary methods such as particle swarm optimizat
ion (Kennedy,2010)and the covariance matrix adaptation evo-lutionary strategy (CMA-ES)(Hansen and Ostermeier,2001).Finally,we provide the Nelder-Mead simplex (Nelder and Mead,1965),which is useful for local search after a good region has been determined.Optunity’s current default solver is particle swarm optimiza-tion,as our experiments have shown it to perform well for a large variety of tuning tasks involving various learning algorithms.Additional solvers will be incorporated in the future.
2.3Software design and implementation
The design philosophy of Optunity prioritizes code clarity over performance.This is justified by the fact that objective function evaluations constitute the real performance bottleneck.
In contrast to typical Python packages,we avoid dependencies on big packages like NumPy/SciPy and scikit-learn to facilitate users working in non-Python environments (sometimes at the cost of performance).To prevent issues for users that are unfamiliar with Python,care is taken to ensure all code in Optunity works out of the box on any Python version above 2.7,without requiring tools like 2to3to make explicit conversions.Optunity has a single dependency on DEAP (Fortin et al.,2012)for the CMA-ES solver.
A key aspect of Optunity’s design is interoperability with external environments.This requires bidirecti
onal communication between Optunity’s Python back-end (O )and the external environment (E )and roughly involves three steps:(i)E →O solver configuration,(ii)O ↔E objective function evaluations and (iii)O →E solution and solver summary.To this end,Optunity can do straightforward communication with any environment via sockets using JSON messages as shown in Figure 1.Only some information must be communicated,big objects like data sets are never exchanged.To port Optunity to a new environment,a thin wrapper must be implemented to handle communication.
grid search random search Nelder-Mead particle swarm CMA-ES ...
R MATLAB Java
...generic solvers arbitrary method Figure 1:Integrating Optunity in non-Python environments.
2.4Documentation
Code is documented using Sphinx and contains many doctests that can serve as both unit tests and
examples of the associated functions.Our website contains API documenta-tion,user documentation and a wide range of examples to illustrate all aspects of the software.The examples involve various packages,including scikit-learn (Pedregosa et al.,2011),OpenCV (Bradski,2000)and Spark’s MLlib (Zaharia et al.,2010).
Claesen,Simm,Popovic,Moreau and De Moor
2.5Collaborative and future development
Collaborative development is organized via GitHub.3The project’s master branch is kept stable and is subjected to continuous integration tests using Travis CI.We recommend prospective users to clone the master branch for the most up-to-date stable version of the software.Bug reports and feature requests can befiled via issues on GitHub.
Future development efforts will focus on wrappers for Java,Julia and C/C++.This will make Optunity readily available in all main environments related to machine learning. We additionally plan to incorporate Bayesian optimization strategies(Jones et al.,1998).
3.Related work
A number of software solutions exist for hyperparameter search.HyperOpt offers random search and sequential model-based optimization(Bergstra et al.,2013).Some packages dedi-cated to Bayesian approaches include Spearmint(Snoek et al.,2012),DiceKriging(Roustant et al.,2012)and BayesOpt(Martinez-Cantin,2014).Finally,ParamILS is a command-line-only tuning framework providing iterated local search(Hutter et al.,2009).
Optunity distinguishes itself from existing packages by exposing a variety of fundamen-tally different solvers.This matters because the no free lunch theorem suggests that no single approach is best in all settings(Wolpert and Macready,1997).Additionally,Optu-nity is easy to integrate in various environments and features a very simple API. Acknowledgments
This research was funded via the following channels:s parameter
•Research Council KU Leuven:GOA/10/09MaNet,CoE PFV/10/016SymBioSys;
•Flemish Government:FWO:projects:G.0871.12N(Neural circuits);IWT:TBM-Logic Insulin(100793),TBM Rectal Cancer(100783),TBM IETA(130256),O&O Ex-aScience Life Pharma,ChemBioBridge,PhD grants(specifically111065);Industrial Research fund(IOF):IOF/HB/13/027Logic Insulin;iMinds Medical Information Technologies SBO2014;VLK Sticht
ing E.van der Schueren:rectal cancer
•Federal Government:FOD:Cancer Plan2012-2015KPC-29-023(prostate)
•COST:Action:BM1104:Mass Spectrometry Imaging
References
James Bergstra and Yoshua Bengio.Random search for hyper-parameter optimization. Journal of Machine Learning Research,13(1):281–305,2012.
James Bergstra,Dan Yamins,and David D Cox.Hyperopt:A python library for optimizing the hyperparameters of machine learning algorithms.In Proceedings of the12th Python in Science Conference,pages13–20.SciPy,2013.
3.We maintain the following subdomains for convenience:{builds,docs,git,issues}.optunity.
Optunity
G.Bradski.The OpenCV library.Dr.Dobb’s Journal of Software Tools,2000.URL www.drdobbs/open-source/the-opencv-library/184404319.
F´e lix-Antoine Fortin,De Rainville,Marc-Andr´e Gardner Gardner,Marc Parizeau,Christian Gagn´e,et al.DEAP:Evolutionary algorithms made easy.Journal of Machine Learning Research,13(1):2171–2175,2012.
Nikolaus Hansen and Andreas Ostermeier.Completely derandomized self-adaptation in evolution strategies.Evolutionary computation,9(2):159–195,2001.
Geoffrey E Hinton.A practical guide to training restricted boltzmann machines.In Neural Networks:Tricks of the Trade,pages599–619.Springer,2012.
Chih-Wei Hsu,Chih-Chung Chang,Chih-Jen Lin,et al.A practical guide to support vector classification,2003.
Frank Hutter,Holger H Hoos,Kevin Leyton-Brown,and Thomas St¨u tzle.ParamILS:an automatic algorithm configuration framework.Journal of Artificial Intelligence Research, 36(1):267–306,2009.
Donald R Jones,Matthias Schonlau,and William J Welch.Efficient global optimization of expensive black-box functions.Journal of Global optimization,13(4):455–492,1998.
James Kennedy.Particle swarm optimization.In Encyclopedia of Machine Learning,pages 760–766.S
pringer,2010.
Ruben Martinez-Cantin.BayesOpt:A Bayesian optimization library for nonlinear opti-mization,experimental design and bandits.arXiv preprint arXiv:1405.7430,2014. John A Nelder and Roger Mead.A simplex method for function minimization.The computer journal,7(4):308–313,1965.
Fabian Pedregosa,Ga¨e l Varoquaux,Alexandre Gramfort,Vincent Michel,Bertrand Thirion,Olivier Grisel,Mathieu Blondel,Peter Prettenhofer,Ron Weiss,Vincent Dubourg,et al.Scikit-learn:Machine learning in Python.Journal of Machine Learning Research,12:2825–2830,2011.
Olivier Roustant,David Ginsbourger,Yves Deville,et al.DiceKriging,DiceOptim:Two R packages for the analysis of computer experiments by kriging-based metamodeling and optimization.2012.
Jasper Snoek,Hugo Larochelle,and Ryan P Adams.Practical Bayesian optimization of machine learning algorithms.In Advances in Neural Information Processing Systems, pages2951–2959,2012.
David H Wolpert and William G Macready.No free lunch theorems for optimization. Evolutionary Computation,IEEE Transactions on,1(1):67–82,1997.
Matei Zaharia,Mosharaf Chowdhury,Michael J Franklin,Scott Shenker,and Ion Stoica. Spark:cluster c
omputing with working sets.In Proceedings of the2nd USENIX confer-ence on Hot topics in cloud computing,pages1–7,2010.
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论