Python脚本下载TCGA⼤数据,⾮常简单,开放源代码
前⾔
使⽤TCGA官⽅的gdc-client下载⼯具有时候很慢,经常会挂掉,那⼲脆⾃⼰写⼀个下载⼩程序。于是使⽤TCGA的API写了个下载TCGA数
据的脚本,脚本也是需要下载manifest⽂件的。
环境
后⾯有把程序打包成EXE,包含命令⾏的和图形界⾯的,让没有python的同学也能⽤
环境:Python3.6
函数包:
os
pandas
requests
sys
argparse
signal
代码
html如何实现空格# coding:utf-8
'''
This tool is to simplify the steps to download TCGA data.The tool has two main parameters,
-m is the manifest file path.
-s is the location where the downloaded file is to be saved (it is best to create a new folder for the downloaded data).
This tool supports breakpoint resuming. After the program is interrupted, it can be restarted,and the program will download file after the last downloaded file author: chenwi
date: 2018/07/10
mail: chenwi4323@gmail
'''
import os
import pandas as pd
import requests
import sys
import argparse
import signal
print(__doc__)
requests.packages.urllib3.disable_warnings()
def download(url, file_path):
r = (url, stream=True, verify=False)
total_size = int(r.headers['content-length'])
# print(total_size)
temp_size = 0
with open(file_path, "wb") as f:
for chunk in r.iter_content(chunk_size=1024):
if chunk:
temp_size += len(chunk)
f.write(chunk)
done = int(50 * temp_size / total_size)
sys.stdout.write("\r[%s%s] %d%%" % ('#' * done, ' ' * (50 - done), 100 * temp_size / total_size))
sys.stdout.flush()
print()
print()
def get_UUID_list(manifest_path):
UUID_list = pd.read_table(manifest_path, sep='\t', encoding='utf-8')['id']
UUID_list = list(UUID_list)
return UUID_list
def get_last_UUID(file_path):
dir_list = os.listdir(file_path)
if not dir_list:
return
else:
dir_list = sorted(dir_list, key=lambda x: ime(os.path.join(file_path, x))) return dir_list[-1][:-4]
def get_lastUUID_index(UUID_list, last_UUID):
for i, UUID in enumerate(UUID_list):
if UUID == last_UUID:
return i
return0
def quit(signum, frame):
# Ctrl+C quit
print('You choose to stop me.')
exit()
print()
if __name__ == '__main__':
signal.signal(signal.SIGINT, quit)
解什微量元素还是常量元素>printf输出格式%bsignal.signal(signal.SIGTERM, quit)
parser = argparse.ArgumentParser()
3dcg模型网parser.add_argument("-m", "--manifest", dest="M", type=str, default="",                        help=" file path")
parser.add_argument("-s", "--save", dest="S", type=str, default=os.curdir,
help="Which folder is the download file saved to?")
args = parser.parse_args()
link = r'api.v/data/'
# args
manifest_path = args.M
save_path = args.S
print("Save file to {}".format(save_path))
UUID_list = get_UUID_list(manifest_path)
last_UUID = get_last_UUID(save_path)
print("Last download file {}".format(last_UUID))
last_UUID_index = get_lastUUID_index(UUID_list, last_UUID)
for UUID in UUID_list[last_UUID_index:]:
url = os.path.join(link, UUID)
file_path = os.path.join(save_path, UUID + '.txt')
download(url, file_path)
print(f'{UUID} have been downloaded')
在命令⾏中命令就⾏:
讲解:
< 是你下载的manifest⽂件路径
python入门教程网盘
xxx是你下载的⽂件像保存到的那个⽂件夹(这个⽂件夹最好是新建的空⽂件夹)
演⽰:
将程序打包成EXE
decentralisation
最后对于那些没有安装Python的⼈来说,可以使⽤我打包好的⼯具来下载TCGA数据,简单⽅便,有点类似gdc-client这个⼯具,哈哈哈,不过⾃⼰写的还是有成就感吧,后期打算做成QT界⾯版本的,点点⿏标就⾏。
放在⽹盘⾥了,有需要可以⾃⾏下载
链接: 密码:3os4
演⽰:
图形界⾯的下载EXE
点点⿏标就能下载的⼩公举exe: 下载地址:
python tcga_download.py - -s xxx

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。