java爬⾍抓取⾏政区划_7-爬⾍爬API抓取⾏政区划
(urllib).ipynb
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"在这个教程中,你将会学到如何⽤⾼德地图api抓取⾏政区划\n",
"\n",
"
提供的基础数据是:
\n",
" 没有,我们的数据⽆中⽣有
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 观察⽹络连接⾏为"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"我们从⾼德地图抓,先观察⼀下如果在⾼德地图输⼊深圳的某⼀个⾏政区划查询,它的⽹络连接⾏为是怎样的\n",
" \n",
"⾕歌浏览器右键检查,或者点设置⾥⾯的开发者⼯具,再点network选项,可以看到⽹络的连接⾏为(其他浏览器也有类似的功能,需要⼀)\n",
"\n",
"我们这⾥⽤到的是爬⾍2.0\n",
"每个⽹络访问中,有\n",
"\n",
" Response Headers(响应头)\n",
" Request Headers(请求头)\n",
" Query String Parameters(查询参数)\n",
" \n",
"其中,请求头和查询参数是我们要关注的"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# json数据格式"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"在⽹络访问⾏为中,对⽅服务器返回给我们的数据是json结构,那么json是什么呢" ]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"先补充⼀下基础知识,学习⼀下python的\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"把list,dict,tuple⾃由组合起来就变成了json\n",
"\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"直接从⾼德地图抓是⽐较困难的,有防爬机制\n",
"\n",
"不过,⾼德专门为开发者提供了抓数据的接⼝\n",
"\n",
"\n",
"\n",
"各位需要注册⼀下⾼德开发者申请⼀个key"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"在其中,⾼德已经给我们提供了开发者专⽤的⾏政区查询服务,以及相关说明\n", "\n",
"在其中选择⼀个⾏政区查询,然后看看⽹络连接⾏为吧"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 开始抓⾏政区划"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"ExecuteTime": {
"end_time": "2020-01-19T03:44:12.354768Z", "start_time": "2020-01-19T03:44:11.803160Z" }
},
"outputs": [],
"source": [
"#导⼊必要的爬⾍包\n",
"import urllib\n",
"from urllib import parse\n",
"from urllib import request\n",
"\n",
"import pandas as pd\n",
"#导⼊json包,后⾯解析json数据\n",
"import json"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"ExecuteTime": {
"end_time": "2020-01-19T03:44:13.091717Z", "start_time": "2020-01-19T03:44:13.086729Z" }
},
"outputs": [],
"source": [
"mykey = '在此输⼊你的key'\n",
"#这个输⼊你开发者key,告诉⾼德这个数据是你抓的,每天会有限额,你们可以注册成为开发者,这样就有⾃⼰的key拉"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"ExecuteTime": {
"end_time": "2020-01-19T03:44:16.032917Z",
"start_time": "2020-01-19T03:44:14.094053Z"
}
},
"outputs": [],
"source": [
"keywords = '罗湖区'\n",
"\n",
"#查询的接⼝地址\n",
"url = 'restapi.amap/v3/config/district?'\n",
"\n",
"#查询的条件\n",
"dict1 = {\n",
"'subdistrict':'3',\n",
" 'showbiz':'false',\n",
" 'extensions':'all',\n",
" 'key':mykey,#这个是我的开发者key,告诉⾼德这个数据是我抓的,每天会有限额,你们可以注册成为开发者,这样就有⾃⼰的key拉\n", " 's':'rsv3',\n",
" 'output':'json',\n",
" 'level':'district',\n",
" 'keywords':keywords,\n",python转java代码
" 'platform':'JS',\n",
" 'logversion':'2.0',\n",
" 'sdkversion':'1.4.10'\n",
"}\n",
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论