Requests库详解--688IT编程网

Requests库详解

urllib库作为基本库，requests库也是在urllib库基础上发展的

但是urllib在使⽤上不如requests便利，⽐如上篇⽂章在写urllib库的时候，⽐如代理设置，处理cookie时，没有写，因为感觉⽐较繁琐，另外在发送post请求的时候，也是⽐较繁琐。

⼀⾔⽽代之，requests库是python实现的简单易⽤的HTTP库

在以后做爬⾍的时候，建议⽤requests，不⽤urllib

⽤法讲解：

#!/usr/bin/env python

# -*- coding:utf-8 -*-

import requests

response = ('www.baidu')

print(type(response))

print(response.status_code)

print())

)

kies)

java基础案例教程黑马程序员答案输出结果为：

200

<!DOCTYPE html>

<html>省略了 </html>

<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu/>]>

由上⾯的⼩程序中可以看出，直接就拿到str类型的数据，⽽在urllib中的ad()得到的是bytes类型的数据，还需要decode，⽐较繁琐，同样的kies直接就将cookies拿到了，⽽不像在urllib中那样繁琐

各种请求⽅式

import requests

requests.post('www.baidu')

requests.put('www.baidu')

使用python的网站requests.delete('www.baidu')

requests.head('www.baidu')

requests.options('www.baidu')

基本get请求：

利⽤/get进⾏get请求测试：

import requests

response = ('/get')

)

输出结果：

{"args":{},"headers":{"Accept":"*/*","Accept-Encoding":"gzip, deflate","Connection":"close","Host":"","User-Agent":"python-

requests/2.18.4"},"origin":"113.71.243.133","url":"/get"}

带参数的get请求：

import requests

data = {'name':'geme','age':'22'}

response = ('/get',params=data)

)

输出结果为：

{"args":{"age":"22","name":"geme"},"headers":{"Accept":"*/*","Accept-Encoding":"gzip, deflate","Connection":"close","Host":"","User-Agent":"python-

requests/2.18.4"},"origin":"113.71.243.133","url":"/get?age=22&name=geme"}

解析json

<返回的其实是json形式的字符串，可以通过response.json()直接进⾏解析，解析结果与json模块loads⽅法得到的结果是完全⼀样的

import requests

import json

response = ('/get')

print())

print(response.json())

print(json.))

print(type(response.json()))

输出结果为：

{'headers': {'Connection': 'close', 'User-Agent': 'python-requests/2.18.4', 'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Host': ''}, 'url': '/get', 'args': {}, 'origin': '113.71.243.133'}

该⽅法在分析ajax请求的时候⽐较常⽤

获取⼆进制数据

⼆进制数据是在下载图⽚或者视频的时候常⽤的⼀个⽅法

import requests

response = ('github/favicon.ico')

print(),t))

)

#⽤t可以获取⼆进制内容

⽂件的保存在爬⾍原理⼀⽂中讲到，就不再赘述　

添加headers

import requests

response = ('www.zhihu/explore')

)

输出结果为:

<html>

<head><title>400 Bad Request</title></head>

<center><h1>400 Bad Request</h1></center>

<hr><center>openresty</center>

</body>

</html>

　在请求这个url的时候，没有加headers，报了⼀个400的状态码，下⾯加上headers试⼀下：

import requests

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'}

response = ('www.zhihu/explore',headers = headers)

)

　加上这个headers之后，可以正常运⾏请求了　

　基本post请求

import requests

data = {'name':'haha','age':'12'}

response = requests.post('/post',data = data)

)

输出结果为：

{"args":{},"data":"","files":{},"form":{"age":"12","name":"haha"},"headers":{"Accept":"*/*","Accept-Encoding":"gzip, deflate","Connection":"close","Content-Length":"16","Content-Type":"application/x-www-form-urlencoded","Host":"","User-Agent":"python-requests/2.18.4"},"json":null,"origin":"113.71.243.133","url":"/post"}

不像在urllib中那样，需要转码等操作，⽅便了很多，还能在post中继续添加headers　

响应

response⼀些常⽤的属性：

response.status_code

response.headers

response.urltypedef给结构体取别名

response.history

java能处理二维数组吗状态码的判断：

不同的状态码对应

python解析json文件import requests

response = ('/get.html')

exit() if not response.status_code ==_found else print('404 not found')

#或者这句替换为

exit() if not response.status_code ==200 else print('404 not found')

#因为不同的状态对应着不同的数字

输出结果为：

404 not found

requests的⼀些⾼级操作　

⽂件上传;

import requests

url = '/post'

file = {'file':open('tt.jpeg','rb')}

response = requests.post(url,files = file)

)

获取cookie

import requests

response = ('www.baidu')

kies)

print(kies))

Cookie 的返回对象为，它的⾏为和字典类似，将其key,value打印出来

import requests

response = ('www.baidu')

kies)

# print(kies))

for key,value kies.items():

print(key + '=' + value)

输出结果为：

<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu/>]>

BDORZ=27315

会话维持：

import requests

<('/cookies/set/number/12345678')#⾏1

response = ('/cookies')#⾏2

)

输出结果：{"cookies":{}}layers游戏

在/cookies中有⼀个set可以设置cookie

通过get拿到这个cookie

但是输出结果为空，原因是⾏1 和⾏2 的两次get在两个浏览器中进⾏

import requests

s = requests.Session()

<('/cookies/set/number/12345678')

response = s.get('/cookies')

)

输出结果为：{"cookies":{"number":"12345678"}}

证书验证：

import requests

response = ('www.12306')

print(response.status_code)

输出结果为：

......

response = ('www.12306',verify = False)

print(response.status_code)

输出结果为：

D:\python-3.5.4. E:/PythonProject/Test1/爬⾍/requests模块.py

D:\python-3.5.4.amd64\lib\site-packages\urllib3\connectionpool.py:858: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is

strongly advised. See: adthedocs.io/en/latest/advanced-usage.html#ssl-warnings

InsecureRequestWarning)

200

.。。。

代理设置

代理类型为http或者https时

import requests

proxies = {

'http':'127.0.0.1:9743',

'https':'127.0.0.1:9743'

}

#声明⼀个字典，指定代理，请求时把代理传过去就可以了，⽐urllib⽅便了很多

#如果代理有密码，在声明的时候，⽐如：

#proxies = {

# 'http':'user:password@127.0.0.1:9743',

#按照如上修改就可以了

response = ('www.taobao',proxies = proxies)

print(response.status_code)

代理类型为socks时：pip3 install request[socks]

proxies = {

'http':'socks5://127.0.0.1:9743',

'https':'socks5://127.0.0.1:9743'

}

response = ('www.taobao',proxies = proxies)

print(response.status_code)

超时设置：

response = ('www.baidu',timeout = 1)

认证设置　

有的密码需要登陆认证，这时可以利⽤auth这个参数：

import requests

from requests.auth import HTTPBasicAuth

r = ('120.27.34.24:900',auth = HTTPBasicAuth('user','password'))

#r = ('120.27.34.24:900',auth=('user','password'))这样写也可以

print(r.status_code)

异常处理：

import requests

ptions import ReadTimeout,HTTPError,RequestException

try:

response = ('www.baidu',timeout = 1)

print(response.status_code)

except ReadTimeout:

print('timeout')

except HTTPError:

print('httrerror')

except RequestException:

print('error')

　例程只引⼊了三个异常，官⽅⽂档⾥还有别的异常，见上图，也可以引⼊，跟例程中的三个异常操作相似　

688IT编程网

Requests库详解

发表评论

推荐文章

java正则表达式选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

热门文章

excel文字递增函数公式

数字递增公式

notepad 正则变量运算

C++regex库常用函数及实例

js正则表达式之前瞻后顾与非捕获分组

indesign正则数字和英文之间的空格

C#匹配中文字符串的4种正则表达式分享

PHP正则表达式匹配中文字符

匹配中文汉字的正则表达式介绍

Python正则表达式如何进行字符串替换

orcl中用正则表达式

sql正则表达式excel

dataframe正则表达式

postgress sql正则

el-upload accept 正则表达式

半小时正则表达式

判断科学计数法的正则

根据url判断静态资源的方法

Java正则表达式-匹配正负浮点数

替换模糊匹配正则-hive

最新文章

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

能被5整除的十进制整数的正规表达式

大于0小于等于1的正则表达式

linux grep 26个字母

java pattern 正则表达式

掌握文本编辑器中的搜索和替换技巧

标签列表

688IT编程网

Requests库详解

发表评论

推荐文章

java正则表达式 选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

热门文章

excel文字递增函数公式

数字递增公式

notepad 正则变量运算

C++regex库常用函数及实例

js正则表达式之前瞻后顾与非捕获分组

indesign正则数字和英文之间的空格

C#匹配中文字符串的4种正则表达式分享

PHP正则表达式匹配中文字符

匹配中文汉字的正则表达式介绍

Python正则表达式如何进行字符串替换

orcl中用正则表达式

sql正则表达式excel

dataframe正则表达式

postgress sql正则

el-upload accept 正则表达式

半小时 正则表达式

判断科学计数法的正则

根据url判断静态资源的方法

Java正则表达式-匹配正负浮点数

替换模糊匹配正则-hive

最新文章

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

能被5整除的十进制整数的正规表达式

大于0小于等于1的正则表达式

linux grep 26个字母

java pattern 正则表达式

掌握文本编辑器中的搜索和替换技巧

标签列表

java正则表达式选择题

非零金额正则表达式

半小时正则表达式