AmazonReviewDataset数据集介绍--688IT编程网

AmazonReviewDataset数据集介绍

Amazon Review Dataset数据集记录了⽤户对亚马逊⽹站商品的评价，是推荐系统的经典数据集，并且Amazon⼀直在更新这个数据集，根据时间顺序，Amazon数据集可以分成三类：

2013 版

2014版

如果直接跳转到2018版，可换为访问

2018版

Amazon数据集可以根据商品类别分为 Books，Electronics，Movies and TV，CDs and Vinyl等⼦数据集，这些⼦数据集包含两类信息：

以2014版数据集为例：

1. 商品信息描述

asin商品id

title商品名称

price价格

imUrl商品图⽚链接

related相关商品

salesRank折扣信息

brand品牌

categories⽬录类别

官⽅例⼦：

{

"asin": "0000031852",

"title": "Girls Ballet Tutu Zebra Hot Pink",

"price": 3.17,

"imUrl": "ecx.images-amazon/images/I/51fAmVkTbyL._SY300_.jpg",

"related":

{

"also_bought": ["B00JHONN1S", "B002BZX8Z6"],

"also_viewed": ["B002BZX8Z6", "B00JHONN1S"],

"bought_together": ["B002BZX8Z6"]

"salesRank": {"Toys & Games": 211836},

"brand": "Coxlures",

"categories": [["Sports & Outdoors", "Other Sports", "Dance"]]

}

2. ⽤户评分记录数据

reviewerID⽤户id

asin商品id

reviewerName⽤户名

helpful有效评价率（helpfulness rating of the review, e.g. 2/3）

reviewText评价⽂本

overall评分

reviewerID⽤户id

summary评价总结

unixReviewTime评价时间戳

reviewTime评价时间

{

"reviewerID": "A2SUAM1J3GNN3B",

"asin": "0000013714",

"reviewerName": "J. McDonald",

"helpful": [2, 3],

"reviewText": "I bought this for my husband who plays the piano. He is having a wonderful time playing these old hymns. The music is at times ha rd to read because we think the book was published for singing from more than playing from. Great purchase though!",

"overall": 5.0,

"summary": "Heavenly Highway Hymns",

"unixReviewTime": 1252800000,

"reviewTime": "09 13, 2009"

}

Amazon数据集读取：

因为下载的数据是json⽂件，不易操作，这⾥主要介绍如何将json⽂件转化为csv格式⽂件。以2014版Amazon Electronics数据集的转化为例：

商品信息读取

import pickle

import pandas as pd

file_path ='meta_Electronics.json'

fin =open(file_path,'r')

df ={}

useless_col =['imUrl','salesRank','related','title','description']# 不想要的字段

i =0

for line in fin:

d =eval(line)

import pickle

for s in useless_col:

if s in d:

d.pop(s)

df[i]= d

i +=1

df = pd.DataFrame.from_dict(df, orient='index')

<_csv('meta_Electronics.csv',index=False)

⽤户评分记录数据读取

file_path ='Electronics_10.json'

fin =open(file_path,'r')

df ={}

useless_col =['reviewerName','reviewText','unixReviewTime','summary']# 不想要的字段

i =0

for line in fin:

d =eval(line)

for s in useless_col:

if s in d:

d.pop(s)

df[i]= d

i +=1

df = pd.DataFrame.from_dict(df, orient='index')

<_csv('Electronics_10.csv',index=False)

688IT编程网

AmazonReviewDataset数据集介绍

发表评论

推荐文章

应用程序的安全检测方法、装置、电子设备和存储介质

nginx map用法正则

VBA之正则表达式(1)--基础篇

Prometheus监控学习笔记之初识PromQL

关于PHP中的webshell

热门文章

m函数数字提取

jest断言方法大全

中兴ZXSEC US 管理员手册

keras系列(一):参数设置

Qt从QString中提取出数字

element input 金额千分位格式化

freemaker 参数解析正则

C#正则验证数字

form表单验证正则

scanf正则表达式用法

grafana value的正则表达式

Android平台浮点数运算应用

js-(JS正则表达式验证数字)

判断Python输入是否是整数,字符,或浮点数

c语言 sscanf 正则规则

从文本中提取数值技巧

js将整数转换成两位浮点数的方法

vue正则限制浮点数

8到20的结尾的正则

shell 正则表达式最后一行

最新文章

应用程序的安全检测方法、装置、电子设备和存储介质

VBA之正则表达式(1)--基础篇

代码编辑的辅助方法、装置及电子设备

SHELL查字符串中包含字符的命令

String方法中replace和replaceAll的区别详解(源码分析)

双字节符号正则

标签列表

688IT编程网

AmazonReviewDataset数据集介绍

发表评论

推荐文章

应用程序的安全检测方法、装置、电子设备和存储介质

nginx map用法 正则

VBA之正则表达式(1)--基础篇

Prometheus监控学习笔记之初识PromQL

关于PHP中的webshell

热门文章

m函数数字提取

jest断言方法大全

中兴ZXSEC US 管理员手册

keras系列(一):参数设置

Qt从QString中提取出数字

element input 金额千分位格式化

freemaker 参数解析正则

C#正则验证数字

form表单验证正则

scanf正则表达式用法

grafana value的正则表达式

Android平台浮点数运算应用

js-(JS正则表达式验证数字)

判断Python输入是否是整数,字符,或浮点数

c语言 sscanf 正则规则

从文本中提取数值技巧

js将整数转换成两位浮点数的方法

vue正则限制浮点数

8到20的结尾的正则

shell 正则表达式 最后一行

最新文章

应用程序的安全检测方法、装置、电子设备和存储介质

VBA之正则表达式(1)--基础篇

代码编辑的辅助方法、装置及电子设备

SHELL查字符串中包含字符的命令

String方法中replace和replaceAll的区别详解(源码分析)

双字节符号正则

标签列表

nginx map用法正则

shell 正则表达式最后一行