Requests库详解
urllib库作为基本库,requests库也是在urllib库基础上发展的
但是urllib在使⽤上不如requests便利,⽐如上篇⽂章在写urllib库的时候,⽐如代理设置,处理cookie时,没有写,因为感觉⽐较繁琐,另外在发送post请求的时候,也是⽐较繁琐。
⼀⾔⽽代之,requests库是python实现的简单易⽤的HTTP库
在以后做爬⾍的时候,建议⽤requests,不⽤urllib
⽤法讲解:
#!/usr/bin/env python
# -*- coding:utf-8 -*-
import requests
response = ('www.baidu')
print(type(response))
print(response.status_code)
print())
)
kies)
java基础案例教程黑马程序员答案输出结果为:
<class 'dels.Response'>
200
<class 'str'>
<!DOCTYPE html>
<!--STATUS OK--><html>省略了 </html>
<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu/>]>
由上⾯的⼩程序中可以看出,直接就拿到str类型的数据,⽽在urllib中的ad()得到的是bytes类型的数据,还需要decode,⽐较繁琐,同样的kies直接就将cookies拿到了,⽽不像在urllib中那样繁琐
各种请求⽅式
import requests
requests.post('www.baidu')
requests.put('www.baidu')
使用python的网站requests.delete('www.baidu')
requests.head('www.baidu')
requests.options('www.baidu')
基本get请求:
利⽤/get进⾏get请求测试:
import requests
response = ('/get')
)
输出结果:
{"args":{},"headers":{"Accept":"*/*","Accept-Encoding":"gzip, deflate","Connection":"close","Host":"","User-Agent":"python-
requests/2.18.4"},"origin":"113.71.243.133","url":"/get"}
带参数的get请求:
import requests
data = {'name':'geme','age':'22'}
response = ('/get',params=data)
)
输出结果为:
{"args":{"age":"22","name":"geme"},"headers":{"Accept":"*/*","Accept-Encoding":"gzip, deflate","Connection":"close","Host":"","User-Agent":"python-
requests/2.18.4"},"origin":"113.71.243.133","url":"/get?age=22&name=geme"}
解析json
<返回的其实是json形式的字符串,可以通过response.json()直接进⾏解析,解析结果与json模块loads⽅法得到的结果是完全⼀样的
import requests
import json
response = ('/get')
print())
print(response.json())
print(json.))
print(type(response.json()))
输出结果为:
<class 'str'>
{'headers': {'Connection': 'close', 'User-Agent': 'python-requests/2.18.4', 'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Host': ''}, 'url': '/get', 'args': {}, 'origin': '113.71.243.133'}
{'headers': {'Connection': 'close', 'User-Agent': 'python-requests/2.18.4', 'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Host': ''}, 'url': '/get', 'args': {}, 'origin': '113.71.243.133'}
<class 'dict'>
该⽅法在分析ajax请求的时候⽐较常⽤
获取⼆进制数据
⼆进制数据是在下载图⽚或者视频的时候常⽤的⼀个⽅法
import requests
response = ('github/favicon.ico')
print(),t))
)
t)
#⽤t可以获取⼆进制内容
⽂件的保存在爬⾍原理⼀⽂中讲到,就不再赘述
添加headers
import requests
response = ('www.zhihu/explore')
)
输出结果为:
<html>
<head><title>400 Bad Request</title></head>
<body bgcolor="white">
<center><h1>400 Bad Request</h1></center>
<hr><center>openresty</center>
</body>
</html>
在请求这个url的时候,没有加headers,报了⼀个400的状态码,下⾯加上headers试⼀下:
import requests
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'}
response = ('www.zhihu/explore',headers = headers)
)
加上这个headers之后,可以正常运⾏请求了
基本post请求
import requests
data = {'name':'haha','age':'12'}
response = requests.post('/post',data = data)
)
输出结果为:
{"args":{},"data":"","files":{},"form":{"age":"12","name":"haha"},"headers":{"Accept":"*/*","Accept-Encoding":"gzip, deflate","Connection":"close","Content-Length":"16","Content-Type":"application/x-www-form-urlencoded","Host":"","User-Agent":"python-requests/2.18.4"},"json":null,"origin":"113.71.243.133","url":"/post"}
不像在urllib中那样,需要转码等操作,⽅便了很多,还能在post中继续添加headers
响应
response⼀些常⽤的属性:
response.status_code
response.headers
response.urltypedef给结构体取别名
response.history
java能处理二维数组吗状态码的判断:
不同的状态码对应
python解析json文件import requests
response = ('/get.html')
exit() if not response.status_code ==_found else print('404 not found')
#或者这句替换为
exit() if not response.status_code ==200 else print('404 not found')
#因为不同的状态对应着不同的数字
输出结果为:
404 not found
requests的⼀些⾼级操作
⽂件上传;
import requests
url = '/post'
file = {'file':open('tt.jpeg','rb')}
response = requests.post(url,files = file)
)
获取cookie
import requests
response = ('www.baidu')
kies)
print(kies))
Cookie 的返回对象为,它的⾏为和字典类似,将其key,value打印出来
import requests
response = ('www.baidu')
kies)
# print(kies))
for key,value kies.items():
print(key + '=' + value)
输出结果为:
<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu/>]>
BDORZ=27315
会话维持:
import requests
<('/cookies/set/number/12345678')#⾏1
response = ('/cookies')#⾏2
)
输出结果:{"cookies":{}}layers游戏
在/cookies中有⼀个set可以设置cookie
通过get拿到这个cookie
但是输出结果为空,原因是⾏1 和⾏2 的两次get在两个浏览器中进⾏
import requests
s = requests.Session()
<('/cookies/set/number/12345678')
response = s.get('/cookies')
)
输出结果为:{"cookies":{"number":"12345678"}}
证书验证:
import requests
response = ('www.12306')
print(response.status_code)
输出结果为:
......
response = ('www.12306',verify = False)
print(response.status_code)
输出结果为:
D:\python-3.5.4. E:/PythonProject/Test1/爬⾍/requests模块.py
D:\python-3.5.4.amd64\lib\site-packages\urllib3\connectionpool.py:858: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is
strongly advised. See: adthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)
200
.。。。
代理设置
代理类型为http或者https时
import requests
proxies = {
'http':'127.0.0.1:9743',
'https':'127.0.0.1:9743'
}
#声明⼀个字典,指定代理,请求时把代理传过去就可以了,⽐urllib⽅便了很多
#如果代理有密码,在声明的时候,⽐如:
#proxies = {
# 'http':'user:password@127.0.0.1:9743',
#}
#按照如上修改就可以了
response = ('www.taobao',proxies = proxies)
print(response.status_code)
代理类型为socks时:pip3 install request[socks]
proxies = {
'http':'socks5://127.0.0.1:9743',
'https':'socks5://127.0.0.1:9743'
}
response = ('www.taobao',proxies = proxies)
print(response.status_code)
超时设置:
response = ('www.baidu',timeout = 1)
认证设置
有的密码需要登陆认证,这时可以利⽤auth这个参数:
import requests
from requests.auth import HTTPBasicAuth
r = ('120.27.34.24:900',auth = HTTPBasicAuth('user','password'))
#r = ('120.27.34.24:900',auth=('user','password'))这样写也可以
print(r.status_code)
异常处理:
import requests
ptions import ReadTimeout,HTTPError,RequestException
try:
response = ('www.baidu',timeout = 1)
print(response.status_code)
except ReadTimeout:
print('timeout')
except HTTPError:
print('httrerror')
except RequestException:
print('error')
例程只引⼊了三个异常,官⽅⽂档⾥还有别的异常,见上图,也可以引⼊,跟例程中的三个异常操作相似
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论