python中实现随机森林_Python中的随机森林
python中实现随机森林
is a highly versatile machine learning method with numerous applications ranging from marketing to healthcare and insurance. It can be used to on customer acquisition, retention, and churn or to in patients.
是⼀种⽤途⼴泛的机器学习⽅法,具有从营销到医疗保健和保险的众多应⽤。 它可⽤于对客户获取,保留和流失或 。
Random forest is capable of regression and classification. It can handle a large number of features, and it’s helpful for estimating which of your variables are important in the underlying data being modeled.
随机森林具有回归和分类的能⼒。 它可以处理⼤量功能,并且有助于估计哪些变量在要建模的基础数据中很重要。
This is a post about random forests using Python.
python基础代码实例
这是⼀篇有关使⽤Python的随机森林的⽂章。
什么是随机森林? (What is a Random Forest?)
Random forest is solid choice for nearly any prediction problem (even non-linear ones). It’s a relatively new machine learning strategy (it came out of Bell Labs in the 90s) and it can be used for just about anything. It belongs to a larger class of machine learning algorithms called ensemble methods.
linux系统界面电脑改地址
随机森林是⼏乎所有预测问题(甚⾄是⾮线性问题)的可靠选择。 这是⼀种相对较新的机器学习策略(它是90年代来⾃贝尔实验室的),⼏乎可以⽤于任何事物。 它属于⼀类称为集成⽅法的较⼤机器学习算法。
合奏学习 (Ensemble Learning)
involves the combination of several models to solve a single prediction problem. It works by generating multiple
classifiers/models which learn and make predictions independently. Those predictions are then combined into a single (mega) prediction that should be as good or better than the prediction made by any one classifer.
涉及多个模型的组合以解决单个预测问题。 它通过⽣成多个独⽴学习和做出预测的分类器/模型来⼯作。 然后将这些预测合并为单个(⼤型)预测,该预测应该⽐任何⼀个分类器的预测都好或更好。
Random forest is a brand of ensemble learning, as it relies on an ensemble of decision trees. More on ensemble learning in Python here: .
随机森林是集成学习的品牌,因为它依赖于决策树的集成。 有关Python合奏学习的更多信息,请参见: 。
随机决策树 (Randomized Decision Trees)
So we know that random forest is an aggregation of other models, but what types of models is it aggregating? As you might have guessed from its name, random forest aggregates . A decision tree is composed of a series of decisions that can be used to classify an observation in a dataset.
因此,我们知道随机森林是其他模型的集合,但是它将聚集什么类型的模型? 正如您可能已经从其名称中猜到的那样,随机森林聚集了 。决策树由⼀系列决策组成,可⽤于对数据集中的观察进⾏分类。
随机森林 (Random Forest)
The algorithm to induce a random forest will create a bunch of random decision trees automatically. Since the trees are generated at random, most won’t be all that meaningful to learning your classification/regression problem (maybe 99.9% of trees).
诱导随机森林的算法将⾃动创建⼀堆随机决策树。 由于树是随机⽣成的,因此对于学习分类/回归问题(可能是99.9%的树)⽽⾔,⼤多数树都没有那么有意义。
If an observation has a length of 45, blue eyes, and 2 legs, it’s going to be classified as red.
如果观察结果的长度为45,蓝眼睛,两条腿,则将其分类为红⾊ 。
树⽊投票 (Arboreal Voting)
So what good are 10000 (probably) bad models? Well it turns out that they really aren’t that helpful. But what is helpful are the few really good decision trees that you also generated along with the bad ones.
那么10000个(可能)坏模型有什么⽤呢? 事实证明,它们确实没有帮助。 但是有⽤的是,您还会⽣成⼀些与坏决策⼀起真正好的决策树。
When you make a prediction, the new observation gets pushed down each decision tree and assigned a predicted
value/label. Once each of the trees in the forest have reported its predicted value/label, the predictions are tallied up and the mode vote of all trees is returned as the final prediction.
进⾏预测时,新观察值将推⼊每个决策树并分配⼀个预测值/标签。 ⼀旦森林中的每棵树都报告了其预测值/标签,就对这些预测进⾏汇总,并返回所有树⽊的模式投票作为最终预测。
Simply, the 99.9% of trees that are irrelevant make predictions that are all over the map and cancel each another out. The predictions of the minority of trees that are good top that noise and yield a good prediction.
sqlserver高可用性组简⽽⾔之,不相关的99.9%的树⽊做出的预测遍布整个地图,并且彼此抵消。 少数树⽊的预测结果好于噪声并产⽣良好的预测。
我为什么要使⽤它? (Why you should I use it?)
这很容易 (It’s Easy)
Random forest is the of learning methods. You can throw pretty much anything at it and it’ll do a serviceable job. It does a particularly good job of estimating inferred transformations, and, as a result, doesn’t require much tuning like SVM (i.e. it’s good for folks with tight deadlines).
随机森林是的学习⽅法。 您可以将⼏乎任何东西扔给它,它将做有⽤的⼯作。 它在估计推断的转换⽅⾯做得特别好,因此,不需要像SVM 这样的⼤量调整(即,对于期限紧迫的⼈们来说⾮常有⽤)。
转换实例 (An Example Transformation)metropolis方法
Random forest is capable of learning without carefully crafted data transformations. Take the the f(x) = log(x) function for example.
随机森林⽆需经过精⼼设计的数据转换即可学习。 以f(x) = log(x)函数为例。
Alright let’s write some code. We’ll be writing our Python code in Yhat’s very own interactive environment built for analyzing data, Rodeo. You can download Rodeo for Mac, Windows or Linux [h
ere](www.yhat/products/rodeo).好吧,让我们写⼀些代码。 我们将在Yhat专门⽤于分析数据的互动环境Rodeo中编写Python代码。 您可以在[此处]
(www.yhat/products/rodeo)下载Mac,Windows或Linux的Rodeo。
First, create some fake data and add a little noise.
⾸先,创建⼀些虚假数据并添加⼀些⼲扰。
import numpy as np
import pylab as pl
x = np.random.uniform(1, 100, 1000)
y = np.log(x) + al(0, .3, 1000)
mysql数据库基础实例教程课后答案pl.scatter(x, y, s=1, label="log(x) with noise")
pl.plot(np.arange(1, 100), np.log(np.arange(1, 100)), c="b", label="log(x) true function") pl.xlabel("x")
pl.ylabel("f(x) = log(x)")
pl.legend(loc="best")
pl.title("A Basic Log Function")
pl.show()
Following along in Rodeo? Here’s what you should see.
跟随⽜仔竞技表演吗? 这是您应该看到的。
Let’s take a closer look at that plot.
让我们仔细看看那个情节。
json数组是什么意思
If we try and build a basic linear model to predict y using x we wind up with a straight line that sort of
bisects the log(x) function. Whereas if we use a random forest, it does a much better job of approximating the log(x) curve and we get something that looks much more like the true function.
如果我们尝试建⽴⼀个基本的线性模型来使⽤x预测y我们将得出⼀条将log(x)函数⼀分为⼆的直线。 ⽽如果我们使⽤随机森林,则在逼近log(x)曲线⽅⾯做得更好,我们得到的东西看起来更像真实函数。
You could argue that the random forest overfits the log(x) function a little bit. Either way, I think this does a nice job of illustrating how the random forest isn’t bound by linear constraints.
您可能会说随机森林稍微有点超出了log(x)函数。 ⽆论哪种⽅式,我认为这都能很好地说明随机森林如何不受线性约束的约束。
⽤途 (Uses)
变量选择 (Variable Selection)
One of the best use cases for random forest is feature selection. One of the byproducts of trying lots of decision tree variations is that you can examine which variables are working best/worst in each tree.

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。