如何⽤python爬取抖⾳视频列表
如果看到特别感兴趣的抖⾳vlogger的视频,想全部dump下来,如何操作呢?下⾯介绍介绍如何使⽤python导出特定⽤户所有视频信息
抓包分析
亨利 马蒂斯Chrome Deveploer Tools Chrome 浏览器开发者⼯具
下拉主页, 选择Network=>XHR 选项卡, 看到类似请求
:authority: www.iesdouyin
:method: GET
:path: /web/api/v2/aweme/post/?user_id=110677980134&sec_uid=&count=21&max_cursor=1561112910000&aid=1128&_signature=3Xf-nxAQgGfUO4SKisB.Ld13 :scheme: https
accept: application/json
accept-encoding: gzip, deflate, br
accept-language: zh-CN,zh;q=0.9,en;q=0.8,ja;q=0.7,zh-TW;q=0.6,da;q=0.5
cookie: tt_webid=6690145457198417412; _ga=GA1.2.605400954.1557670882; _ba=BA0.2-20181226-5199e-GIJXgXk9ajNkyFhmv7Wy; _gid=GA1.2.191450152 referer: www.iesdouyin/share/user/110677980134
user-agent: Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Mobile/15A372 Safari/604.1
x-requested-with: XMLHttpRequest
复制代码.
PYthon学习企鹅裙:88198-2657 领取python⾃动化编程资料教程
返回数据截图
字段类型说明
user_id int抖⾳账号的ID
count int返回的数据条数,就⽤默认值21
max_cursor int请求的游标,每次请求带上上次请求返回的max_cursor
aid int使⽤默认值11128
_signature string每次请求带上的参数签名
dytk string每次请求带上的⼀个参数
参数的获取⽅法:
www.iesdouyin/share/user/110677980134
(function() {
$(function(){
__M.require('douyin_falcon:page/reflow_user/index').init({
uid: "110677980134",
dytk: '061ae6e81229e178146aa674327eba89'
});
});
})();
PYthon学习企鹅裙:88198-2657 领取python⾃动化编程资料教程
通过正则获取到此参数
_signature 获取⽐较复杂,抖⾳对前端的js代码进⾏了混淆压缩,不易直接分析出算法过程,不过可以执⾏签名算法代码,并返回对应的签名结果。
执⾏js代码的可以使⽤nodejs或者selenium webdriver,这⾥推荐使⽤selenium webdriver , nodejs的js执⾏环境与浏览器有区
别,计算出的签名结果,并不能通过验证,selenium webdriver 可以调⽤本地浏览器,计算出的签名
可以跟浏览器直接访问访问计算
出的签名⼀致。
格式化之后的js代码,点击查看, 执⾏js⽅法 _bytedAcrawler.sign("110677980134") 对参数进⾏签名代码实现导出主页视频列表
activex控件安装步骤def get_user_video_list_by_uid(user_id, cursor=0):
url = 'www.iesdouyin/web/api/v2/aweme/post/?'
sign, dytk = signature(user_id)
tk_logger.info("sign:%s,dytk:%s" % (sign, dytk))
if sign is None or dytk is None:
tk_logger.log("sign [%s] or dytk [%s] is none" % (sign, dytk))
return Nonecorrel函数相关系数大小意义
headers = dict_merge(CHROME_HEADER, {
"Accept": "application/json",
"X-Requested-With": "XMLHttpRequest",
})
params = {
"user_id": user_id,
matlab画图形"count": "21",
"max_cursor": cursor,
"aid": "1128",
"_signature": sign,
"dytk": dytk
}
res = (url, headers=headers, params=params)
tk_logger.info("request url: %s" % res.url)
content = t.decode("utf8")
jsn = json.loads(content)
return jsn
PYthon学习企鹅裙:88198-2657 领取python⾃动化编程资料教程
获取的视频列表信息
获取视频信息代码⽚段
def get_video_detail_by_id(video_id):
url = "aweme-hl.snssdk/aweme/v1/aweme/detail/?version_code=6.5.0&pass-region=1&pass-route=1&js_sdk_version=1.16.2.7&app_name=aweme data = {"aweme_id": video_id}
headers = {
"sdk-version": "1",
"x-Tt-Token": "00fc1e7950db67b5f43a312e9265cdfee513ea70c36d918c871f3bb553347f3db50ffca143b8722327b345816a75efca071d",
"User-Agent": "Aweme 6.5.0 rv:65014 (iPhone; iOS 12.3.1; en_CN) Cronet",
"Content-Type": "application/x-www-form-urlencoded",免费表格模板大全
"Cookie": "tt_webid=6636348554880222728; __tea_sdk__user_unique_id=6636348554880222728; odin_tt=76d9b82d6e6f2ddfc99719a5b5d44a7d703cf97 "X-Khronos": "1559956401",
"X-Gorgon": "8300000000002e40eee38cad71d14037bd1385d18bc973f094f5",
}
ret = {}
res = requests.post(url, data=data, headers=headers)
if res.status_code == 200:
# tk_logger.info("video detail raw:%s" % t.decode("utf8"))
jsn = json.t)
detail = ("aweme_detail", {})
video_info = get_video_info(detail)
user_info = get_user_info(detail)
play_addr = get_play_address(detail)
video_cover = get_video_cover(detail)
ret["video_info"] = video_info
ret["user_info"] = user_info
ret["play_addr"] = play_addr
ret["video_cover"] = video_cover
else:
raise TKException("get video detail failed [%s][%d]" % (url, res.status_code))
PYthon学习企鹅裙:88198-2657 领取python⾃动化编程资料教程
下载视频代码⽚段
detail = get_video_detail_by_id(video_id)
def download_video(detail):
url = ("play_addr", {}).get("url_list", [])
if len(url) == 0:
raise TKException("cannot get video url list [%s]" % detail)
url = url[0]
folder = DOWNLOAD_DIR + '/' + ('user_info', {}).get("uid", "unknown")抖音python入门教程
if not ists(folder):
os.mkdir(folder)
video_id = ('video_info', {}).get('statistics', {}).get('aweme_id')
# filename = "%s/%s" % (folder, ("video_info", {}).get("desc", video_id) + ".mp4")
filename = "%s/%s" % (folder, video_id + ".mp4")
tk_logger.info("download video %s" % url)
if os.path.isfile(filename):
file_size = get_remote_file_size(url)
if file_size == size(filename):
tk_logger.info("file already downloaded, skip ...")
return
else:
tk_logger.info("download file , file size:%d" % file_size)
res = (url, headers=IOS_HEADER)
if res.status_code == 200:
with open(filename, "wb") as fp:
for chunk in res.iter_content(chunk_size=1024):
fp.write(chunk)
else:
raise TKException("download video [%s] failed [%d]" % (url, res.status_code))
PYthon学习企鹅裙:88198-2657 领取python⾃动化编程资料教程
下载视频
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论