【新手入门】课程3-Paddle入门-波士顿房价预测--688IT编程网

【新⼿⼊门】课程3-Paddle⼊门-波⼠顿房价预测

经典的线性回归模型主要⽤来预测⼀些存在着线性关系的数据集。回归模型可以理解为：存在⼀个点集，⽤⼀条曲线去拟合它分布的过程。如果拟合曲线是⼀条直线，则称为线性回归。如果是⼀条⼆次曲线，则被称为⼆次回归。线性回归是回归模型中最简单的⼀种。本教程使⽤PaddlePaddle建⽴起⼀个房价预测模型。

在线性回归中：

linux如何恢复被删数据（1）假设函数是指，⽤数学的⽅法描述⾃变量和因变量之间的关系，它们之间可以是⼀个线性函数或⾮线性函数。在本次线性回顾模型中，我们的假设函数为 Y’= wX+b ，其中，Y’表⽰模型的预测结果（预测房价），⽤来和真实的Y区分。模型要学习的参数即：w,b。

（2）损失函数是指，⽤数学的⽅法衡量假设函数预测结果与真实值之间的误差。这个差距越⼩预测越准确，⽽算法的任务就是使这个差距越来越⼩。建⽴模型后，我们需要给模型⼀个优化⽬标，使得学到的参数能够让预测值Y’尽可能地接近真实值Y。这个实值通常⽤来反映模型误差的⼤⼩。不同问题场景下采⽤不同的损失函数。对于线性模型来讲，最常⽤的损失函数就是均⽅误差（Mean Squared Error，MSE）。

（3）优化算法：神经⽹络的训练就是调整权重（参数）使得损失函数值尽可能得⼩，在训练过程中，将损失函数值逐渐收敛，得到⼀组使得神经⽹络拟合真实模型的权重（参数）。所以，优化算法的最终⽬标是到损失函数的最⼩值。⽽这个寻过程就是不断地微调变量w和b的值，⼀步⼀步地试出这个最⼩值。常见的优化算法有随机梯度下降法（SGD）、Adam算法等等

⾸先导⼊必要的包，分别是：

paddle.fluid--->PaddlePaddle深度学习框架

numpy---------->python基本库，⽤于科学计算

os------------------>python的模块，可使⽤该模块对操作系统进⾏操作

matplotlib----->python绘图库，可⽅便绘制折线图、散点图等图形

In[1]

import paddle.fluid as fluid

import paddle

import numpy as np

import os

import matplotlib.pyplot as plt

Step1：准备数据。

（1）uci-housing数据集介绍

数据集共506⾏，每⾏14列。前13列⽤来描述房屋的各种信息，最后⼀列为该类房屋价格中位数。

PaddlePaddle提供了读取uci_housing训练集和测试集的接⼝，分别为paddle.dataset.ain()和

paddle.dataset.st()。

(2)train_reader和test_reader

paddle.batch()表⽰每BATCH_SIZE组成⼀个batch

In[2]

BUF_SIZE=500

BATCH_SIZE=20

#⽤于训练的数据提供器，每次从缓存中随机读取批次⼤⼩的数据

train_reader = paddle.batch(

buf_size=BUF_SIZE),

batch_size=BATCH_SIZE)

#⽤于测试的数据提供器，每次从缓存中随机读取批次⼤⼩的数据

test_reader = paddle.batch(

buf_size=BUF_SIZE),

batch_size=BATCH_SIZE)

[==================================================]housing/housing.data not found, downloading paddlemodels.bj.bcebos/uci_hou /opt/conda/envs/python35-paddle120-env/lib/python3.5/site-packages/paddle/dataset/uci_housing.py:49: UserWarning:

This call to matplotlib.use() has no effect because the backend has already

been chosen; matplotlib.use() must be called *before* pylab, matplotlib.pyplot,

or matplotlib.backends is imported for the first time.

The backend was *originally* set to 'module://ipykernel.pylab.backend_inline' by the following code:

File "/opt/conda/envs/python35-paddle120-env/lib/python3.5/runpy.py", line 193, in _run_module_as_main

"__main__", mod_spec)

File "/opt/conda/envs/python35-paddle120-env/lib/python3.5/runpy.py", line 85, in _run_code

exec(code, run_globals)

File "/opt/conda/envs/python35-paddle120-env/lib/python3.5/site-packages/ipykernel_launcher.py", line 16, in <module>

app.launch_new_instance()

File "/opt/conda/envs/python35-paddle120-env/lib/python3.5/site-packages/traitlets/config/application.py", line 658, in launch_instance

app.start()

File "/opt/conda/envs/python35-paddle120-env/lib/python3.5/site-packages/ipykernel/kernelapp.py", line 505, in start

self.io_loop.start()

File "/opt/conda/envs/python35-paddle120-env/lib/python3.5/site-packages/tornado/platform/asyncio.py", line 132, in start

self.asyncio_loop.run_forever()

File "/opt/conda/envs/python35-paddle120-env/lib/python3.5/asyncio/base_events.py", line 421, in run_forever

self._run_once()

File "/opt/conda/envs/python35-paddle120-env/lib/python3.5/asyncio/base_events.py", line 1425, in _run_once

handle._run()

File "/opt/conda/envs/python35-paddle120-env/lib/python3.5/asyncio/events.py", line 127, in _run

self._callback(*self._args)

File "/opt/conda/envs/python35-paddle120-env/lib/python3.5/site-packages/tornado/ioloop.py", line 758, in _run_callback

ret = callback()软件测试员岗位要求

File "/opt/conda/envs/python35-paddle120-env/lib/python3.5/site-packages/tornado/stack_context.py", line 300, in null_wrapper

return fn(*args, **kwargs)

File "/opt/conda/envs/python35-paddle120-env/lib/python3.5/site-packages/tornado/gen.py", line 1233, in inner

self.run()

File "/opt/conda/envs/python35-paddle120-env/lib/python3.5/site-packages/tornado/gen.py", line 1147, in run

yielded = send(value)

File "/opt/conda/envs/python35-paddle120-env/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 357, in process_one

yield gen.maybe_future(dispatch(*args))

File "/opt/conda/envs/python35-paddle120-env/lib/python3.5/site-packages/tornado/gen.py", line 326, in wrapper

yielded = next(result)

File "/opt/conda/envs/python35-paddle120-env/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 267, in dispatch_shell

yield gen.maybe_future(handler(stream, idents, msg))

File "/opt/conda/envs/python35-paddle120-env/lib/python3.5/site-packages/tornado/gen.py", line 326, in wrapper

yielded = next(result)

File "/opt/conda/envs/python35-paddle120-env/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 534, in execute_request

user_expressions, allow_stdin,

File "/opt/conda/envs/python35-paddle120-env/lib/python3.5/site-packages/tornado/gen.py", line 326, in wrapper

python入门教程appyielded = next(result)

File "/opt/conda/envs/python35-paddle120-env/lib/python3.5/site-packages/ipykernel/ipkernel.py", line 294, in do_execute

res = shell.run_cell(code, store_history=store_history, silent=silent)

File "/opt/conda/envs/python35-paddle120-env/lib/python3.5/site-packages/ipykernel/zmqshell.py", line 536, in run_cell

return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)

File "/opt/conda/envs/python35-paddle120-env/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2821, in run_cell

igger('post_run_cell', result)

File "/opt/conda/envs/python35-paddle120-env/lib/python3.5/site-packages/IPython/core/events.py", line 88, in trigger

func(*args, **kwargs)

File "/opt/conda/envs/python35-paddle120-env/lib/python3.5/site-packages/ipykernel/pylab/backend_inline.py", line 164, in configure_once

activate_matplotlib(backend)

File "/opt/conda/envs/python35-paddle120-env/lib/python3.5/site-packages/IPython/core/pylabtools.py", line 314, in activate_matplotlib

matplotlib.pyplot.switch_backend(backend)

File "/opt/conda/envs/python35-paddle120-env/lib/python3.5/site-packages/matplotlib/pyplot.py", line 231, in switch_backend

matplotlib.use(newbackend, warn=False, force=True)

crayon中文翻译File "/opt/conda/envs/python35-paddle120-env/lib/python3.5/site-packages/matplotlib/__init__.py", line 1422, in use

dules['matplotlib.backends'])

File "/opt/conda/envs/python35-paddle120-env/lib/python3.5/importlib/__init__.py", line 166, in reloadc++ 链表排序

_bootstrap._exec(spec, module)

File "/opt/conda/envs/python35-paddle120-env/lib/python3.5/site-packages/matplotlib/backends/__init__.py", line 16, in <module>

line for line in traceback.format_stack()

matplotlib.use('Agg')

(3)打印看下数据是什么样的？PaddlePaddle接⼝提供的数据已经经过归⼀化等处理

(array([-0.02964322, -0.11363636, 0.39417967, -0.06916996, 0.14260276, -0.10109875, 0.30715859, -0.13176829, -0.24127857, 0.05489093, 0.29196451, -0.2368098 , 0.12850267]), array([15.6])),

In[3]

#⽤于打印，查看uci_housing数据

train_data=paddle.dataset.ain();

sampledata=next(train_data())

print(sampledata)

(array([-0.0405441 , 0.06636364, -0.32356227, -0.06916996, -0.03435197,

0.05563625, -0.03475696, 0.02682186, -0.37171335, -0.21419304,

-0.33569506, 0.10143217, -0.21172912]), array([24.]))

Step2:⽹络配置

（1）⽹络搭建：对于线性回归来讲，它就是⼀个从输⼊到输出的简单的全连接层。

对于波⼠顿房价数据集，假设属性和房价之间的关系可以被属性间的线性组合描述。

In[4]

#定义张量变量x，表⽰13维的特征值

x = fluid.layers.data(name='x', shape=[13], dtype='float32')

#定义张量y，表⽰⽬标值

y = fluid.layers.data(name='y', shape=[1], dtype='float32')

#定义⼀个简单的线性⽹络，连接输⼊和输出的全连接层

#input：输⼊tensor;

#size：该层输出单元的数⽬

#act：激活函数

y_predict=fluid.layers.fc(input=x,size=1,act=None)

(2)定义损失函数

此处使⽤均⽅差损失函数。

square_error_cost(input,lable)：接受输⼊预测值和⽬标值，并返回⽅差估计，即为（y-y_predict）的平⽅

In[5]

cost = fluid.layers.square_error_cost(input=y_predict, label=y) #求⼀个batch的损失值

avg_cost = an(cost) #对损失值求平均值

(3)定义优化函数

此处使⽤的是随机梯度下降。

In[6]

optimizer = fluid.optimizer.SGDOptimizer(learning_rate=0.001)

opts = optimizer.minimize(avg_cost)

In[7]

test_program = fluid.default_main_program().clone(for_test=True)

在上述模型配置完毕后，得到两个fluid.Program：fluid.default_startup_program() 与fluid.default_main_program() 配置完毕了。

参数初始化操作会被写⼊fluid.default_startup_program()

fluid.default_main_program()⽤于获取默认或全局main program(主程序)。该主程序⽤于训练和测试模型。fluid.layers 中的所有layer 函数可以向 default_main_program 中添加算⼦和变量。default_main_program 是fluid的许多编程接⼝（API）的Program参数的缺省值。例如，当⽤户program没有传⼊的时候， Executor.run() 会默认执⾏ default_main_program 。

Step3.模型训练 and Step4.模型评估

（1）创建Executor

⾸先定义运算场所 fluid.CPUPlace()和 fluid.CUDAPlace(0)分别表⽰运算场所为CPU和GPU

Executor：接收传⼊的program，通过run()⽅法运⾏program。

In[8]

use_cuda = False #use_cuda为False，表⽰运算场所为CPU;use_cuda为True，表⽰运算场所为GPUfseek功能

place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()

exe = fluid.Executor(place) #创建⼀个Executor实例exe

exe.run(fluid.default_startup_program()) #Executor的run()⽅法执⾏startup_program()，进⾏参数初始化

[]

（2）定义输⼊数据维度

DataFeeder负责将数据提供器（train_reader,test_reader）返回的数据转成⼀种特殊的数据结构，使其可以输⼊到Executor中。

feed_list设置向模型输⼊的向变量表或者变量表名

In[9]

# 定义输⼊数据维度

feeder = fluid.DataFeeder(place=place, feed_list=[x, y])#feed_list：向模型输⼊的变量表或变量表名

（3）定义绘制训练过程的损失值变化趋势的⽅法draw_train_process

In[10]

iter=0;

iters=[]

train_costs=[]

def draw_train_process(iters,train_costs):

title="training cost"

plt.title(title, fontsize=24)

plt.xlabel("iter", fontsize=14)

plt.ylabel("cost", fontsize=14)

plt.plot(iters, train_costs,color='red',label='training cost')

plt.show()

（4）训练并保存模型

Executor接收传⼊的program，并根据feed map(输⼊映射表)和fetch_list(结果获取表) 向program中添加feed operators(数据输⼊算⼦)和fetch operators（结果获取算⼦)。 feed map为该program提供输⼊数据。fetch_list提供program训练结束后⽤户预期的变量。

注：enumerate() 函数⽤于将⼀个可遍历的数据对象(如列表、元组或字符串)组合为⼀个索引序列，同时列出数据和数据下标，

In[11]

688IT编程网

【新手入门】课程3-Paddle入门-波士顿房价预测

发表评论

推荐文章

随机森林算法介绍及R语言实现

基于随机森林优化的神经网络算法在冬小麦产量预测中的应用研究_百度文 ...

基于正则化贪心森林算法的情感分析方法研究

随机森林算法和grandientboosting算法

基于随机森林的图像分类算法研究

热门文章

随机森林算法的改进方法

基于随机森林算法的风险预警模型研究

Python中的随机森林算法详解

随机森林发展历史

如何使用随机森林进行时间序列数据模式识别(八)

随机森林回归模型原理

如何使用随机森林进行时间序列数据模式识别(六)

如何使用随机森林进行时间序列数据预测(四)

如何使用随机森林进行异常检测(六)

随机森林算法和grandientboosting算法 -回复

随机森林方法总结全面

随机森林算法原理和步骤

随机森林的原理

随机森林重要性

随机森林算法

机器学习中随机森林的原理

随机森林算法原理

使用计算机视觉技术进行动物识别的技巧

基于crf命名实体识别实验总结

transformer预测模型训练方法

最新文章

随机森林算法介绍及R语言实现

基于随机森林优化的神经网络算法在冬小麦产量预测中的应用研究_百度文 ...

基于正则化贪心森林算法的情感分析方法研究

随机森林算法和grandientboosting算法

基于随机森林的图像分类算法研究

随机森林结合直接正交信号校正的模型传递方法

标签列表

688IT编程网

【新手入门】课程3-Paddle入门-波士顿房价预测

发表评论

推荐文章

随机森林算法介绍及R语言实现

基于随机森林优化的神经网络算法在冬小麦产量预测中的应用研究_百度文 ...

基于正则化贪心森林算法的情感分析方法研究

随机森林算法和grandientboosting算法

基于随机森林的图像分类算法研究

热门文章

随机森林算法的改进方法

基于随机森林算法的风险预警模型研究

Python中的随机森林算法详解

随机森林发展历史

如何使用随机森林进行时间序列数据模式识别(八)

随机森林回归模型原理

如何使用随机森林进行时间序列数据模式识别(六)

如何使用随机森林进行时间序列数据预测(四)

如何使用随机森林进行异常检测(六)

随机森林算法和grandientboosting算法 -回复

随机森林方法总结全面

随机森林算法原理和步骤

随机森林的原理

随机森林 重要性

随机森林算法

机器学习中随机森林的原理

随机森林算法原理

使用计算机视觉技术进行动物识别的技巧

基于crf命名实体识别实验总结

transformer预测模型训练方法

最新文章

随机森林算法介绍及R语言实现

基于随机森林优化的神经网络算法在冬小麦产量预测中的应用研究_百度文 ...

基于正则化贪心森林算法的情感分析方法研究

随机森林算法和grandientboosting算法

基于随机森林的图像分类算法研究

随机森林结合直接正交信号校正的模型传递方法

标签列表

随机森林重要性