R软件实现随机森林算法(带详细代码操作)--688IT编程网

R软件实现随机森林算法（带详细代码操作）

随机森林算法是我们经常要用到的机器学习，本文我们将使用随机森林模型，根据鲍鱼的一系列特征来预测鲍鱼是否“新鲜”。鲍鱼数据来自UCI机器学习存储库（我们将数据分为训练和测试集）。

目录如下：

1、数据准备（输入，数据已处理等）

2、数据分割（把数据分割为测试集和训练集）

3、变量选择

4、模型拟合结果及评估（混淆矩阵，ROC曲线等）

首先，我们将数据加载到R中：

# 加载需要的安装包

library(caret)

library(ranger)

library(tidyverse)

library(e1071)

# 读入数据

abalone_data <- read.table("../data/abalone.data", sep = ",")

# 读入变量名称

colnames(abalone_data) <- c("sex", "length", "diameter", "height",

"whole.weight", "shucked.weight",

"viscera.weight", "shell.weight", "age")

# 对预测变量进行划分

abalone_data <- abalone_data %>%

mutate(old = age > 10) %>%

# remove the "age" variable

select(-age)

# 把数据分割成训练集合测试集

set.seed(23489)

train_index <- sample(1:nrow(abalone_data), 0.9 * nrow(abalone_data))

abalone_train <- abalone_data[train_index, ]

abalone_test <- abalone_data[-train_index, ]

# remove the original dataset

rm(abalone_data)

# view the first 6 rows of the training data

head(abalone_train)

可以看到，输出结果如下：

下一步，拟合随机森林模型

rf_fit <- train(as.factor(old) ~ .,

data = abalone_train,

method = "ranger")

默认情况下，train不带任何参数函数重新运行模型超过25个bootstrap样本和在调谐参数的3个选项（用于调谐参数ranger是mtry;随机选择的预测器在树中的每个切口的数目）。

rf_fit

## Random Forest

## 3759 samples

## 8 predictor

## 2 classes: 'FALSE', 'TRUE'

## No pre-processing

## Resampling: Bootstrapped (25 reps)

## Summary of sample sizes: 3759, 3759, 3759, 3759, 3759, 3759, ...

## Resampling results across tuning parameters:

## mtry splitrule Accuracy Kappa

## 2 gini 0.7828887 0.5112202

## 2 extratrees 0.7807373 0.4983028

## 5 gini 0.7750120 0.4958132

## 5 extratrees 0.7806244 0.5077483

## 9 gini 0.7681104 0.4819231

## 9 extratrees 0.7784264 0.5036977

## Tuning parameter 'de.size' was held constant at a value of 1

## Accuracy was used to select the optimal model using the largest value.

## The final values used for the model were mtry = 2, splitrule = gini

## de.size = 1.

使用内置predict函数，在独立的测试集上测试数据同样简单。

# predict the outcome on a test set

abalone_rf_pred <- predict(rf_fit, abalone_test)

# compare predicted outcome and true outcome

confusionMatrix(abalone_rf_pred, as.factor(abalone_test$old))

## Confusion Matrix and Statistics

## Reference

## Prediction FALSE TRUE

## FALSE 231 52

## TRUE 42 93

## Accuracy : 0.7751

## 95% CI : (0.732, 0.8143)

## No Information Rate : 0.6531

## P-Value [Acc > NIR] : 3.96e-08

## Kappa : 0.4955

## Mcnemar's Test P-Value : 0.3533

## Sensitivity : 0.8462

## Specificity : 0.6414

## Pos Pred Value : 0.8163

## Neg Pred Value : 0.6889

## Prevalence : 0.6531

## Detection Rate : 0.5526

## Detection Prevalence : 0.6770

## Balanced Accuracy : 0.7438

## 'Positive' Class : FALSE

现在我们已经看到了如何拟合模型以及默认的重采样实现（引导）和参数选择。尽管这很棒，但使用插入符号可以做更多的事情。

预处理（preProcess）

插入符号很容易实现许多预处理步骤。脱字号的几个独立功能针对设置模型时可能出现的特定问题。这些包括

dummyVars：根据具有多个类别的分类变量创建伪变量

nearZeroVar：识别零方差和接近零方差的预测变量（在进行二次采样时可能会引起问题）

findCorrelation：确定相关的预测变量

findLinearCombos：确定预测变量之间的线性相关性

除了这些单独的功能外，还存在preProcess可用于执行更常见任务（例如居中和缩放，插补和变换）的功能。preProcess接收要处理的数据帧和方法，可以是“ BoxCox”，“ YeoJohnson”，“ expoTrans”，“ center”，“ scale”，“ range”，“ knnImpute”，“ bagImpute”，“ medianImpute”中的任何一种”，“ pca”，“ ica”，“ spatialSign”，“ corr”，“ zv”，“ nzv”和“ conditionalX”。

# center, scale and perform a YeoJohnson transformation

# identify and remove variables with near zero variance

# perform pca

abalone_no_nzv_pca <- preProcess(select(abalone_train, - old),

method = c("center", "scale", "nzv", "pca"))

abalone_no_nzv_pcabootstrapped

## Created from 3759 samples and 8 variables

## Pre-processing:

## - centered (7)

688IT编程网

R软件实现随机森林算法(带详细代码操作)

发表评论

推荐文章

应用程序的安全检测方法、装置、电子设备和存储介质

nginx map用法正则

VBA之正则表达式(1)--基础篇

Prometheus监控学习笔记之初识PromQL

关于PHP中的webshell

热门文章

m函数数字提取

jest断言方法大全

中兴ZXSEC US 管理员手册

keras系列(一):参数设置

Qt从QString中提取出数字

element input 金额千分位格式化

freemaker 参数解析正则

C#正则验证数字

form表单验证正则

scanf正则表达式用法

grafana value的正则表达式

Android平台浮点数运算应用

js-(JS正则表达式验证数字)

判断Python输入是否是整数,字符,或浮点数

c语言 sscanf 正则规则

从文本中提取数值技巧

js将整数转换成两位浮点数的方法

vue正则限制浮点数

8到20的结尾的正则

shell 正则表达式最后一行

最新文章

应用程序的安全检测方法、装置、电子设备和存储介质

VBA之正则表达式(1)--基础篇

代码编辑的辅助方法、装置及电子设备

SHELL查字符串中包含字符的命令

String方法中replace和replaceAll的区别详解(源码分析)

双字节符号正则

标签列表

688IT编程网

R软件实现随机森林算法(带详细代码操作)

发表评论

推荐文章

应用程序的安全检测方法、装置、电子设备和存储介质

nginx map用法 正则

VBA之正则表达式(1)--基础篇

Prometheus监控学习笔记之初识PromQL

关于PHP中的webshell

热门文章

m函数数字提取

jest断言方法大全

中兴ZXSEC US 管理员手册

keras系列(一):参数设置

Qt从QString中提取出数字

element input 金额千分位格式化

freemaker 参数解析正则

C#正则验证数字

form表单验证正则

scanf正则表达式用法

grafana value的正则表达式

Android平台浮点数运算应用

js-(JS正则表达式验证数字)

判断Python输入是否是整数,字符,或浮点数

c语言 sscanf 正则规则

从文本中提取数值技巧

js将整数转换成两位浮点数的方法

vue正则限制浮点数

8到20的结尾的正则

shell 正则表达式 最后一行

最新文章

应用程序的安全检测方法、装置、电子设备和存储介质

VBA之正则表达式(1)--基础篇

代码编辑的辅助方法、装置及电子设备

SHELL查字符串中包含字符的命令

String方法中replace和replaceAll的区别详解(源码分析)

双字节符号正则

标签列表

nginx map用法正则

shell 正则表达式最后一行