python爬⾍系列(三)json格式提取——⼩⿊盒为例
这⼀章,我们以⼩⿊盒官⽹⽹页版改版前的⼀个JSON链接为例,详细讲解,如何提取json⽂件,并且保存到excel中(保存到数据库的⽅
式类似只是格式不太相同⽽已)。
⾸先,还是⽼样⼦,获取header和url
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36'}
url = 'api.xiaoheihe/bbs/web/link/list?limit=20&offset=0&topic_id=55058&heybox_id=17864741&sort_filter=reply&type_filter=all&os_type=web&version=9然后,我们寻url的链接规律,发现控制页⾯的参数是offset于是有了
for j in range(0, 141):
url = 'api.xiaoheihe/bbs/web/link/list?limit=20&offset={0}&topic_id=55058&heybox_id=17864741&sort_filter=reply&type_filter=all&os_type=web&ve 通过这种⽅式,我们可以⾃⾏控制页⾯的跳转,然后我们,根据json的规律,把⽂件保存到excel中:
dataes = []
analyse = ['标题', '点击量', '点赞量', '评论量', '内容']
dataes.append(analyse)
for i in range(0,18):
sentences.append(json_page['result']['links'][i]['title'])
sentences.append(json_page['result']['links'][i]['click'])python解析json文件
sentences.append(json_page['result']['links'][i]['up'])
sentences.append(json_page['result']['links'][i]['comment_num'])
sentences.append(json_page['result']['links'][i]['description'])
drupal数据库配置文件dataes.append(sentences)
print(sentences)
sentences = []
print('第{}页'.format(j))
python视频讲解workbook = xlsxwriter.Workbook('loldata2.xlsx')
worksheet = workbook.add_worksheet()
for j in range(0, 2450):
worksheet.write_row('A' + str(j + 1), dataes[j])
workbook.close()
接下来是完整的代码
#-*- coding: utf-8 -*-
import requests
import time
编程搜题app
import re
import json
import xlsxwriter
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36'}
h = 'api.xiaoheihe'
#对句⼦列表⽹页进⾏获取,得到⼀个list包含句⼦链接巨加偏旁组词
def get_selist():
dataes = []
analyse = ['标题', '点击量', '点赞量', '评论量', '内容']
dataes.append(analyse)
for j in range(0, 141):
url = 'api.xiaoheihe/bbs/web/link/list?limit=20&offset={0}&topic_id=55058&heybox_id=17864741&sort_filter=reply&type_filter=all&os_type=web&ve sentences = []
response = (url, headers=headers) #访问所有句⼦列表
json_page = json.)
print()
for i in range(0,18):
sentences.append(json_page['result']['links'][i]['title'])
sentences.append(json_page['result']['links'][i]['click'])
sentences.append(json_page['result']['links'][i]['up'])
sentences.append(json_page['result']['links'][i]['comment_num'])
sentences.append(json_page['result']['links'][i]['description'])
dataes.append(sentences)
print(sentences)
sentences = []
print('第{}页'.format(j))
workbook = xlsxwriter.Workbook('loldata2.xlsx')
worksheet = workbook.add_worksheet()
for j in range(0, 2450):
worksheet.write_row('A' + str(j + 1), dataes[j])
workbook.close()
if __name__ == '__main__':
get_selist()
interceptor和filter的区别这样就把⼩⿊盒中的数据链接完整爬取出来,保存下来了
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论