Python编程实现使用线性回归预测数据--688IT编程网

Python编程实现使⽤线性回归预测数据

1) 预测房⼦价格

我们有下⾯的数据集：

输⼊编号平⽅英尺价格

11506450

22007450

32508450

43009450

535011450

640015450

760018450

步骤：

在线性回归中，我们都知道必须在数据中出⼀种线性关系，以使我们可以得到θ0和θ1。我们的假设⽅程式如下所⽰：

其中： hθ(x)是关于特定平⽅英尺的价格值（我们要预测的值），（意思是价格是平⽅英尺的线性函数）； θ0是⼀个常数； θ1是回归系数。

那么现在开始编程：

步骤1

打开你最喜爱的⽂本编辑器，并命名为predict_house_price.py。我们在我们的程序中要⽤到下⾯的包，所以把下⾯代码复制到

predict_house_price.py⽂件中去。

# Required Packages

import matplotlib.pyplot as plt

import numpy as np

import pandas as pd

from sklearn import datasets, linear_model

运⾏⼀下你的代码。如果你的程序没错，那步骤1基本做完了。

步骤2

我把数据存储成⼀个.csv⽂件，名字为input_data.csv 所以让我们写⼀个函数把数据转换为X值（平⽅英尺）、Y值（价格）

# Function to get data

def get_data(file_name):

data =pd.read_csv(file_name)

X_parameter =[]

Y_parameter =[]

for single_square_feet ,single_price_value in zip(data['square_feet'],data['price']):

X_parameter.append([float(single_square_feet)])

Y_parameter.append(float(single_price_value))

return X_parameter,Y_parameter

第3⾏：将.csv数据读⼊Pandas数据帧。

第6-9⾏：把Pandas数据帧转换为X_parameter和Y_parameter数据，并返回他们。

所以，让我们把X_parameter和Y_parameter打印出来：

脚本输出： [[150.0], [200.0], [250.0], [300.0], [350.0], [400.0], [600.0]] [6450.0, 7450.0, 8450.0, 9450.0, 11450.0, 15450.0, 18450.0] [Finished in 0.7s]

步骤3

现在让我们把X_parameter和Y_parameter拟合为线性回归模型。我们要写⼀个函数，输⼊为X_parameters、Y_parameter和你要预测的平⽅英尺值，返回θ0、θ1和预测出的价格值。

# Function for Fitting our data to Linear model

def linear_model_main(X_parameters,Y_parameters,predict_value):

# Create linear regression object

regr =linear_model.LinearRegression()

regr.fit(X_parameters, Y_parameters)

predict_outcome =regr.predict(predict_value)

predictions ={}

predictions['intercept'] =regr.intercept_

predictions['coefficient'] =f_

predictions['predicted_value'] =predict_outcome

return predictions

第5-6⾏：⾸先，创建⼀个线性模型，⽤我们的X_parameters和Y_parameter训练它。

第8-12⾏：我们创建⼀个名称为predictions的字典，存着θ0、θ1和预测值，并返回predictions字典为输出。

所以让我们调⽤⼀下我们的函数，要预测的平⽅英尺值为700。

X,Y =get_data('input_data.csv')

predictvalue =700

result =linear_model_main(X,Y,predictvalue)

print"Intercept value ", result['intercept']

print"coefficient", result['coefficient']

print"Predicted value: ",result['predicted_value']

脚本输出：Intercept value 1771.80851064 coefficient [ 28.77659574] Predicted value: [ 21915.42553191] [Finished in 0.7s]

这⾥，Intercept value（截距值）就是θ0的值，coefficient value（系数）就是θ1的值。我们得到预测的价格值为21915.4255——意味着我们已经把预测房⼦价格的⼯作做完了！

为了验证，我们需要看看我们的数据怎么拟合线性回归。所以我们需要写⼀个函数，输⼊为X_parameters和Y_parameters，显⽰出数据拟合的直线。

# Function to show the resutls of linear fit model

def show_linear_line(X_parameters,Y_parameters):

# Create linear regression object

regr =linear_model.LinearRegression()

regr.fit(X_parameters, Y_parameters)

plt.scatter(X_parameters,Y_parameters,color='blue')

plt.plot(X_parameters,regr.predict(X_parameters),color='red',linewidth=4)

plt.show()

那么调⽤⼀下show_linear_line函数吧：

show_linear_line(X,Y)

2)预测下周哪个电视节⽬会有更多的观众

闪电侠是⼀部由剧作家/制⽚⼈Greg Berlanti、Andrew Kreisberg和Geoff Johns创作，由CW电视台播放的美国电视连续剧。它基于DC 漫画⾓⾊闪电侠（Barry Allen），⼀个具有超⼈速度移动能⼒的装扮奇特的打击犯罪的超级英雄，这个⾓⾊是由Robert Kanigher、John Broome和Carmine Infantino创作。它是绿箭侠的衍⽣作品，存在于同⼀世界。该剧集的试播篇由Berlanti、Kreisberg和Johns写

作，David Nutter执导。该剧集于2014年10⽉7⽇在北美⾸映，成为CW电视台收视率最⾼的电视节⽬。

绿箭侠是⼀部由剧作家/制⽚⼈ Greg Berlanti、Marc Guggenheim和Andrew Kreisberg创作的电视连续剧。它基于DC漫画⾓⾊绿箭侠，⼀个由Mort Weisinger和George Papp创作的装扮奇特的犯罪打击战⼠。它于2012年10⽉10⽇在北美⾸映，与2012年末开始全球播出。主要拍摄于Vancouver、British Columbia、Canada，该系列讲述了亿万花花公⼦Oliver Queen，由Stephen Amell扮演，被困在敌⼈的岛屿上五年之后，回到家乡打击犯罪和腐败，成为⼀名武器是⼸箭的神秘义务警员。不像漫画书中，Queen最初没有使⽤化名”绿箭侠“。

由于这两个节⽬并列为我最喜爱的电视节⽬头衔，我⼀直想知道哪个节⽬更受其他⼈欢迎——谁会最终赢得这场收视率之战。所以让我们写⼀个程序来预测哪个电视节⽬会有更多观众。我们需要⼀个数

据集，给出每⼀集的观众。幸运地，我从上得到了这个数据，并整理成⼀个.csv⽂件。它如下所⽰。

闪电侠闪电侠美国观众数绿箭侠绿箭侠美国观众数

1 4.831 2.84

2 4.272 2.32

3 3.593 2.55

4 3.534 2.49

5 3.465 2.73

6 3.736 2.6

7 3.477 2.64

8 4.348 3.92

9 4.669 3.06

观众数以百万为单位。

解决问题的步骤：

⾸先我们需要把数据转换为X_parameters和Y_parameters，不过这⾥我们有两个X_parameters和Y_parameters。因此，把他们命名为flash_x_parameter、flash_y_parameter、arrow_x_parameter、arrow_y_parameter吧。然后我们需要把数据拟合为两个不同的线性回归模型——先是闪电侠，然后是绿箭侠。接着我们需要预测两个电视节⽬下⼀集的观众数量。然后我们可以⽐较结果，推测哪个节⽬会有更多观众。

步骤1

导⼊我们的程序包：

# Required Packages

import csv

import sys

import matplotlib.pyplot as plt

import numpy as np

import pandas as pd

from sklearn import datasets, linear_model

步骤2

写⼀个函数，把我们的数据集作为输⼊，返回flash_x_parameter、flash_y_parameter、arrow_x_parameter、arrow_y_parameter values。

# Function to get data

def get_data(file_name):

data =pd.read_csv(file_name)

flash_x_parameter =[]

flash_y_parameter =[]

arrow_x_parameter =[]

arrow_y_parameter =[]

for x1,y1,x2,y2 in zip(data['flash_episode_number'],data['flash_us_viewers'],data['arrow_episode_number'],data['arrow_us_viewers']):

flash_x_parameter.append([float(x1)])

flash_y_parameter.append(float(y1))

arrow_x_parameter.append([float(x2)])

python怎么读csv数据

arrow_y_parameter.append(float(y2))

return flash_x_parameter,flash_y_parameter,arrow_x_parameter,arrow_y_parameter

688IT编程网

Python编程实现使用线性回归预测数据

发表评论

推荐文章

应用程序的安全检测方法、装置、电子设备和存储介质

nginx map用法正则

VBA之正则表达式(1)--基础篇

Prometheus监控学习笔记之初识PromQL

关于PHP中的webshell

热门文章

一种任意人头与任意人体的3D结合方法

正则匹配c语言中8进制

fortran数据格式

python中文本转数字用的公式

gh 文本变数值

js判断输入是否为正整数、浮点数等数字的函数代码

qt浮点数正则表达式

QT正则表达式限制输入值

手机号码和电话号码的正则表达式

str转浮点-概述说明以及解释

英豪结尾的诗句

Java正则表达式:符合以特定字符串开头,以特定字符串结尾的所有结果

machinebuilder使用手册

ASP.NET网站建设基本常用代码

LCD显示实时时钟

经纬度正则表达式解析

前端科学计数法转数字

python正则表达式re之compile函数解析

pythonunittest之断言及示例

[lua]lua中匹配字符串小数

最新文章

nginx map用法正则

Prometheus监控学习笔记之初识PromQL

关于PHP中的webshell

python中re.findall函数实例用法

nginx url表达式

nginx 正则匹配参数

标签列表

688IT编程网

Python编程实现使用线性回归预测数据

发表评论

推荐文章

应用程序的安全检测方法、装置、电子设备和存储介质

nginx map用法 正则

VBA之正则表达式(1)--基础篇

Prometheus监控学习笔记之初识PromQL

关于PHP中的webshell

热门文章

一种任意人头与任意人体的3D结合方法

正则匹配c语言中8进制

fortran数据格式

python中文本转数字用的公式

gh 文本变数值

js判断输入是否为正整数、浮点数等数字的函数代码

qt浮点数正则表达式

QT正则表达式限制输入值

手机号码和电话号码的正则表达式

str转浮点-概述说明以及解释

英豪结尾的诗句

Java正则表达式:符合以特定字符串开头,以特定字符串结尾的所有结果

machinebuilder使用手册

ASP.NET网站建设基本常用代码

LCD显示实时时钟

经纬度正则表达式解析

前端科学计数法转数字

python正则表达式re之compile函数解析

pythonunittest之断言及示例

[lua]lua中匹配字符串小数

最新文章

nginx map用法 正则

Prometheus监控学习笔记之初识PromQL

关于PHP中的webshell

python中re.findall函数实例用法

nginx url表达式

nginx 正则匹配参数

标签列表

nginx map用法正则

nginx map用法正则