通过kaggleapi下载数据集
Kaggle API使⽤教程
Beta 版 - Kaggle 保留修改当前提供的 API 功能的权利。
重要提⽰:使⽤ 1.5.0 之前的 API 版本提交的⽐赛可能⽆法正常⼯作。如果您在提交竞赛时遇到困难,请使⽤ 来检查您的版本kaggle --version。如果低于 1.5.0,请更新pip install kaggle --upgrade.
⼀、安装Kaggle环境并配置
1.1 安装Kaggle Package
确保您安装了 Python 3 和包管理pip器。
运⾏以下命令以使⽤命令⾏访问 Kaggle API:
pip install kaggle(您可能需要pip install --user kaggle在 Mac/Linux 上执⾏。如果在安装过程中出现问题,建议这样做。)sudo pip install kaggle除⾮您了解⾃⼰在做什么,否则通过 root ⽤户(即)完成的安装将⽆法正常⼯作。即便如此,它们仍然可能不起作⽤。如果出现权限错误,强烈建议⽤户安装。
您现在可以使⽤kaggle以下⽰例中所⽰的命令。
如果遇到kaggle: command not found错误,请确保您的 Python ⼆进制⽂件在您的路径上。您可以kaggle通过执⾏pip uninstall kaggle并查看⼆进制⽂件的位置来查看安装位置。对于 Linux 上的本地⽤户安装,默认位置是~/.local/bin. 在 Windows 上,默认位置是$PYTHON_HOME/Scripts.
重要提⽰:我们不提供 Python 2 ⽀持。在报告任何问题之前,请确保您使⽤的是 Python 3。
1.2 API Token配置
注册成功后登录kaggle
点击右上⾓头像处,会弹出相关侧边栏设置,如下
点击Your Profile,进⼊设置
在上⾯的页⾯到API对应的设置,点击Create New Token,这将触发下载包含您的 API 凭据的⽂件kaggle.json。对应的kagge.json如下
kaggle配置
本机安装kaggle api
shell
pip install kaggle
将此⽂件放在该位置~/.kaggle/kaggle.json
若没有这个⽬录,则在根⽬录下创建.kaggle⽂件夹,再把kaggle.json放⼊
shell
cd ~
mkdir .kaggle
cd ~/.kaggle/
(在 Windows 上的该位置C:\Users\<Windows-username>\.kaggle\kaggle.json- 您可以检查确切位置,⽆驱动器,使⽤echo %HOMEPATH%)。您可以定义⼀个 shell 环境变
量KAGGLE_CONFIG_DIR来将此位置更改为$KAGGLE_CONFIG_DIR/kaggle.json(在 Windows 上为%KAGGLE_CONFIG_DIR%\kaggle.json)。
为了您的安全,请确保您计算机的其他⽤户对您的凭据没有读取权限。在基于 Unix 的系统上,您可以使⽤以下命令执⾏此操作:
chmod 600 ~/.kaggle/kaggle.json
您还可以选择将您的 Kaggle ⽤户名和令牌导出到环境中:
导出KAGGLE_USERNAME=datadinosaur
导出KAGGLE_KEY=xxxxxxxxxxxxxx
此外,您可以导出通常采⽤$HOME/.kaggle/kaggle.json“KAGGLE_”格式(注意⼤写)的任何其他配置值。
例如,如果⽂件具有变量“proxy”,您将导出KAGGLE_PROXY 并由客户端查看。
⼆、Kaggle Command命令使⽤
命令⾏⼯具⽀持以下命令:
shell
kaggle competitions {list, files, download, submit, submissions, leaderboard}
kaggle datasets {list, files, download, create, version, init}
kaggle kernels {list, init, push, pull, output, status}
kaggle config {view, set, unset}
有关使⽤这些命令中的每⼀个,请参阅下⾯的更多详细信息。
2.1 Competitions⽐赛
该 API ⽀持以下⽤于 Kaggle ⽐赛的命令。
shell
usage: kaggle competitions [-h]
{list,files,download,submit,submissions,leaderboard}
...
optional arguments:
-h, --help show this help message and exit
commands:
{list,files,download,submit,submissions,leaderboard}
list List available competitions
files List competition files
download Download competition files
submit Make a new competition submission
submissions Show your competition submissions
leaderboard Get competition leaderboard information
2.1.1 列出⽐赛
shell
usage: kaggle competitions list [-h] [--group GROUP] [--category CATEGORY] [--sort-by SORT_BY] [-p PAGE] [-s SEARCH] [-v]
optional arguments:
-h, --help show this help message and exit
--group GROUP Search for competitions in a specific group. Default is 'general'. Valid options are 'general', 'entered', and 'inClass'
--category CATEGORY Search for competitions of a specific category. Default is 'all'. Valid options are 'all', 'featured', 'research', 'recruitment', 'gettin gStarted', 'masters', and 'playground'
--sort-by SORT_BY Sort list results. Default is 'latestDeadline'. Valid options are 'grouped', 'prize', 'earliestDeadline', 'latestDeadline', 'numberOfTea ms', and 'recentlyCreated'
-p PAGE, --page PAGE Page number for results paging. Page size is 20 by default
-s SEARCH, --search SEARCH
Term(s) to search for
-v, --csv Print results in CSV format
(if not set print in table format)
使⽤实例:
shell
kaggle competitions list -s health
shell
kaggle competitions list --category gettingStarted
2.1.2 列出⽐赛⽂件
shell
usage: kaggle competitions files [-h] [-v] [-q] [competition]
optional arguments:
-h, --help show this help message and exit
competition Competition URL suffix (use "kaggle competitions list" to show options)
If empty, the default competition will be used (use "kaggle config set competition")"
-v, --csv Print results in CSV format (if not set print in table format)
-q, --quiet Suppress printing information about the upload/download progress
使⽤实例:
shell
kaggle competitions files favorita-grocery-sales-forecasting
2.1.3 下载⽐赛⽂件
shell
usage: kaggle competitions download [-h] [-f FILE_NAME] [-p PATH] [-w] [-o]
[-q]
[competition]
optional arguments:
-h, --help show this help message and exit
competition Competition URL suffix (use "kaggle competitions list" to show options)
If empty, the default competition will be used (use "kaggle config set competition")"
-f FILE_NAME, --file FILE_NAME
File name, all files downloaded if not provided
(use "kaggle competitions files -c <competition>" to show options)
-p PATH, --path PATH Folder where file(s) will be downloaded, defaults to current working directory
-w, --wp Download files to current working path
-o, --force Skip check whether local version of file is up to date, force file download
-q, --quiet Suppress printing information about the upload/download progress
使⽤实例:
kaggle competitions download favorita-grocery-sales-forecasting
kaggle competitions download favorita-grocery-sales-forecasting -f test.csv.7z
2.1.4 提交⽐赛
shell
usage: kaggle competitions submit [-h] -f FILE_NAME -m MESSAGE [-q]
[competition]
required arguments:
-f FILE_NAME, --file FILE_NAME
File for upload (full path)
-m MESSAGE, --message MESSAGE
Message describing this submission
optional arguments:
-h, --help show this help message and exit
competition Competition URL suffix (use "kaggle competitions list" to show options)
If empty, the default competition will be used (use "kaggle config set competition")"
-q, --quiet Suppress printing information about the upload/download progress
使⽤实例:
kaggle competitions submit favorita-grocery-sales-forecasting -f sample_submission_favorita.csv.7z -m "My submission message"
2.1.5 列出参赛作品
shell
usage: kaggle competitions submissions [-h] [-v] [-q] [competition]
optional arguments:
-h, --help show this help message and exit
competition Competition URL suffix (use "kaggle competitions list" to show options)
If empty, the default competition will be used (use "kaggle config set competition")"
-v, --csv Print results in CSV format (if not set print in table format)
-q, --quiet Suppress printing information about the upload/download progress
使⽤实例:
kaggle competitions submissions favorita-grocery-sales-forecasting
2.1.6 获取⽐赛排⾏榜
shell
usage: kaggle competitions leaderboard [-h] [-s] [-d] [-p PATH] [-v] [-q]
[competition]
optional arguments:
-h, --help show this help message and exit
competition Competition URL suffix (use "kaggle competitions list" to show options)
If empty, the default competition will be used (use "kaggle config set competition")"
-s, --show Show the top of the leaderboard
-
d, --download Download entire leaderboard
-p PATH, --path PATH Folder where file(s) will be downloaded, defaults to current working directory
-v, --csv Print results in CSV format (if not set print in table format)
-q, --quiet Suppress printing information about the upload/download progress
例⼦:
shell
kaggle competitions leaderboard favorita-grocery-sales-forecasting -s
2.2 数据集
API ⽀持以下⽤于 Kaggle 数据集的命令。
shell
usage: kaggle datasets [-h]
{list,files,download,create,version,init,metadata,status} ...
optional arguments:
-h, --help show this help message and exit
commands:
{list,files,download,create,version,init,metadata, status}
list List available datasets
files List dataset files
download Download dataset files
create Create a new dataset
version Create a new dataset version
init Initialize metadata file for dataset creation
metadata Download metadata about a dataset
status Get the creation status for a dataset
2.2.1 列出数据集
shell
usage: kaggle datasets list [-h] [--sort-by SORT_BY] [--size SIZE] [--file-type FILE_TYPE] [--license LICENSE_NAME] [--tags TaG_IDS] [-s SEARCH] [-m ] [--user USER] [-p PAGE] [-v]
optional arguments:
-h, --help show this help message and exit
--sort-by SORT_BY Sort list results. Default is 'hottest'. Valid options are 'hottest', 'votes', 'updated', and 'active'
--size SIZE Search for datasets of a specific size. Default is 'all'. Valid options are 'all', 'small', 'medium', and 'large'
-
-file-type FILE_TYPE Search for datasets with a specific file type. Default is 'all'. Valid options are 'all', 'csv', 'sqlite', 'json', and 'bigQuery'. Please note that bigQuery datasets cannot be downloaded
--license LICENSE_NAME
Search for datasets with a specific license. Default is 'all'. Valid options are 'all', 'cc', 'gpl', 'odb', and 'other'
--tags TAG_IDS Search for datasets that have specific tags. Tag list should be comma separated
-s SEARCH, --search SEARCH
Term(s) to search for
-m, --mine Display only my items
active下载--user USER Find public datasets owned by a specific user or organization
-p PAGE, --page PAGE Page number for results paging. Page size is 20 by default
-v, --csv Print results in CSV format (if not set print in table format)
使⽤实例:
shell
kaggle datasets list -s demographics
kaggle datasets list --sort-by votes
2.2.2 列出数据集的⽂件
shell
usage: kaggle datasets files [-h] [-v] [dataset]
optional arguments:
-h, --help show this help message and exit
dataset Dataset URL suffix in format <owner>/<dataset-name> (use "kaggle datasets list" to show options)
-v, --csv Print results in CSV format (if not set print in table format)
使⽤实例:
shell
kaggle datasets files zillow/zecon
2.2.3 下载数据集⽂件
shell
usage: kaggle datasets download [-h] [-f FILE_NAME] [-p PATH] [-w] [--unzip]
[-o] [-q]
[dataset]
optional arguments:
-h, --help show this help message and exit
dataset Dataset URL suffix in format <owner>/<dataset-name> (use "kaggle datasets list" to show options)
-f FILE_NAME, --file FILE_NAME
File name, all files downloaded if not provided
(use "kaggle datasets files -d <dataset>" to show options)
-p PATH, --path PATH Folder where file(s) will be downloaded, defaults to current working directory
-w, --wp Download files to current working path
--unzip Unzip the downloaded file. Will delete the zip file when completed.
-o, --force Skip check whether local version of file is up to date, force file download
-q, --quiet Suppress printing information about the upload/download progress
使⽤实例:
shell
kaggle datasets download zillow/zecon
kaggle datasets download zillow/zecon -f State_time_series.csv
请注意,⽆法下载 BigQuery 数据集。
在对应数据集上到API command,复制到剪切板
如上⾯这个数据集的命令就是:
shell
kaggle datasets download -d cisautomotiveapi/large-car-dataset
2.2.4 初始化元数据⽂件以创建数据集
shell
usage: kaggle datasets init [-h] [-p FOLDER]
optional arguments:
-h, --help show this help message and exit
-
p FOLDER, --path FOLDER
Folder for upload, containing data files and a special dataset-metadata.json file (github/Kaggle/kaggle-api/wiki/Dataset-Metad ata). Defaults to current working directory
使⽤实例:
shell
kaggle datasets init -p /path/to/dataset
2.2.5 创建新数据集
如果要创建新的数据集,⾸先需要启动元数据⽂件。您可以通过kaggle datasets init如上所述运⾏来实现这⼀点。
shell
usage: kaggle datasets create [-h] [-p FOLDER] [-u] [-q] [-t] [-r {skip,zip,tar}]
optional arguments:
-h, --help show this help message and exit
-p FOLDER, --path FOLDER
Folder for upload, containing data files and a special dataset-metadata.json file (github/Kaggle/kaggle-api/wiki/Dataset-Metad ata). Defaults to current working directory
-u, --public Create publicly (default is private)
-q, --quiet Suppress printing information about the upload/download progress
-t, --keep-tabular Do not convert tabular files to CSV (default is to convert)
-r {skip,zip,tar}, --dir-mode {skip,zip,tar}
What to do with directories: "skip" - ignore; "zip" - compressed upload; "tar" - uncompressed upload
使⽤实例:
shell
kaggle datasets create -p /path/to/dataset
2.2.6 创建新的数据集版本
shell
usage: kaggle datasets version [-h] -m VERSION_NOTES [-p FOLDER] [-q] [-t]
[-r {skip,zip,tar}] [-d]
required arguments:
-m VERSION_NOTES, --message VERSION_NOTES
Message describing the new version
optional arguments:
-h, --help show this help message and exit
-p FOLDER, --path FOLDER
Folder for upload, containing data files and a special dataset-metadata.json file (github/Kaggle/kaggle-api/wiki/Dataset-Metad ata). Defaults to current working directory
-q, --quiet Suppress printing information about the upload/download progress
-t, --keep-tabular Do not convert tabular files to CSV (default is to convert)
-r {skip,zip,tar}, --dir-mode {skip,zip,tar}
What to do with directories: "skip" - ignore; "zip" - compressed upload; "tar" - uncompressed upload
-d, --delete-old-versions
Delete old versions of this dataset
使⽤实例:
shell
kaggle datasets version -p /path/to/dataset -m "Updated data"
2.2.7 下载现有数据集的元数据
shell
usage: kaggle datasets metadata [-h] [-p PATH] [dataset]
optional arguments:
-h, --help show this help message and exit
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论