Python处理JSON(转)--688IT编程网

Python处理JSON（转）

概念

序列化（Serialization）：将对象的状态信息转换为可以存储或可以通过⽹络传输的过程，传输的格式可以是JSON、XML等。反序列化就是从存储区域（JSON，XML）读取反序列化对象的状态，重新创建该对象。

JSON（JavaScript Object Notation）：⼀种轻量级数据交换格式，相对于XML⽽⾔更简单，也易于阅读和编写，机器也⽅便解析和⽣成，Json是JavaScript中的⼀个⼦集。

Python2.6开始加⼊了JSON模块，⽆需另外下载，Python的Json模块序列化与反序列化的过程分别是 encoding和 decoding

encoding：把⼀个Python对象编码转换成Json字符串

decoding：把Json格式字符串解码转换成Python对象

对于简单数据类型（string、unicode、int、float、list、tuple、dict），可以直接处理。

json.dumps⽅法对简单数据类型encoding：

import json

data = [{'a':"A",'b':(2,4),'c':3.0}] #list对象

print "DATA:",repr(data)

data_string = json.dumps(data)

print "JSON:",data_string

输出：

DATA: [{'a':'A','c':3.0,'b':(2,4)}] #python的dict类型的数据是没有顺序存储的

JSON: [{"a":"A","c":3.0,"b":[2,4]}]

JSON的输出结果与DATA很相似，除了⼀些微妙的变化，如python的元组类型变成了Json的数组，Python到Json的编码转换规则是：

json.loads⽅法处理简单数据类型的decoding（解码）转换

import json

data = [{'a':"A",'b':(2,4),'c':3.0}] #list对象

data_string = json.dumps(data)

print "ENCODED:",data_string

decoded = json.loads(data_string)

print "DECODED:",decoded

print "ORIGINAL:",type(data[0]['b'])

print "DECODED:",type(decoded[0]['b'])

输出:

ENCODED: [{"a": "A", "c": 3.0, "b": [2, 4]}]

DECODED: [{u'a': u'A', u'c': 3.0, u'b': [2, 4]}]

ORIGINAL: <type 'tuple'>

DECODED: <type 'list'>

解码过程中，json的数组最终转换成了python的list，⽽不是最初的tuple类型，Json到Python的解码规则是：

json的⼈⽂关怀

编码后的json格式字符串紧凑的输出，⽽且也没有顺序，因此dumps⽅法提供了⼀些可选的参数，让输出的格式提⾼可读性，如sort_keys是告诉编码器按照字典排序(a到z)输出。

import json

data = [ { 'a':'A', 'b':(2, 4), 'c':3.0 } ]

print 'DATA:', repr(data)

unsorted = json.dumps(data)

print 'JSON:', json.dumps(data)

print 'SORT:', json.dumps(data, sort_keys=True)

输出:

DATA: [{'a': 'A', 'c': 3.0, 'b': (2, 4)}]

JSON: [{"a": "A", "c": 3.0, "b": [2, 4]}]

SORT: [{"a": "A", "b": [2, 4], "c": 3.0}

indent参数根据数据格式缩进显⽰，读起来更加清晰:

import json

data = [ { 'a':'A', 'b':(2, 4), 'c':3.0 } ]

print 'DATA:', repr(data)

print 'NORMAL:', json.dumps(data, sort_keys=True)

print 'INDENT:', json.dumps(data, sort_keys=True, indent=2)

输出:

DATA: [{'a': 'A', 'c': 3.0, 'b': (2, 4)}]

NORMAL: [{"a": "A", "b": [2, 4], "c": 3.0}]

INDENT: [

{

"a": "A",

"b": [

"c": 3.0

}

]

separators参数的作⽤是去掉,,：后⾯的空格，从上⾯的输出结果都能看到", :"后⾯都有个空格，这都是为了美化输出结果的作⽤，但是在我们传输数据的过程中，越精简越好，冗余的东西全部去掉，因此就可以加上separators参数：

import json

data = [ { 'a':'A', 'b':(2, 4), 'c':3.0 } ]

print 'DATA:', repr(data)

print 'repr(data) :', len(repr(data))

print 'dumps(data) :', len(json.dumps(data))

print 'dumps(data, indent=2) :', len(json.dumps(data, indent=2))

python解析json文件print 'dumps(data, separators):', len(json.dumps(data, separators=(',',':')))

输出：

DATA: [{'a': 'A', 'c': 3.0, 'b': (2, 4)}]

repr(data) : 35

dumps(data) : 35

dumps(data, indent=2) : 76

dumps(data, separators): 29

skipkeys参数，在encoding过程中，dict对象的key只可以是string对象，如果是其他类型，那么在编码过程中就会抛出ValueError的异常。skipkeys可以跳过那些⾮string对象当作key的处理.

import json

data= [ { 'a':'A', 'b':(2, 4), 'c':3.0, ('d',):'D tuple' } ]

try:

print json.dumps(data)

except (TypeError, ValueError) as err:

print 'ERROR:', err

print json.dumps(data, skipkeys=True)

输出:

ERROR: keys must be a string

[{"a": "A", "c": 3.0, "b": [2, 4]}]

让json⽀持⾃定义数据类型

以上例⼦都是基于python的built-in类型的，对于⾃定义类型的数据结构，json模块默认是没法处理的，会抛出异常：TypeError xx is not JSON serializable，此时你需要⾃定义⼀个转换函数:

import json

class MyObj(object):

def __init__(self, s):

self.s = s

def __repr__(self):

return '<MyObj(%s)>' % self.s

obj = .MyObj('helloworld')

try:

print json.dumps(obj)

except TypeError, err:

print 'ERROR:', err

#转换函数

def convert_to_builtin_type(obj):

print 'default(', repr(obj), ')'

# 把MyObj对象转换成dict类型的对象

d = { '__class__':obj.__class__.__name__,

'__module__':obj.__module__,

}

d.update(obj.__dict__)

return d

print json.dumps(obj, default=convert_to_builtin_type)

输出:

ERROR: <MyObj(helloworld)> is not JSON serializable

default( <MyObj(helloworld)> )

{"s": "hellworld", "__module__": "MyObj", "__class__": "__main__"}

#注意：这⾥的class和module根据你代码的所在⽂件位置不同⽽不同

相反，如果要把json decode 成python对象，同样也需要⾃定转换函数，传递给json.loads⽅法的object_hook参数：

#jsontest.py

import json

class MyObj(object):

def __init__(self,s):

self.s = s

def __repr__(self):

return "<MyObj(%s)>" % self.s

def dict_to_object(d):

if '__class__' in d:

class_name = d.pop('__class__')

module_name = d.pop('__module__')

module = __import__(module_name)

print "MODULE:",module

class_ = getattr(module,class_name)

print "CLASS",class_

args = dict((de('ascii'),value) for key,value in d.items())

vb里index什么意思print 'INSTANCE ARGS:',args

inst = class_(**args)

else:

inst = d

return inst

encoded_object = '[{"s":"helloworld","__module__":"jsontest","__class__":"MyObj"}]'

myobj_instance = json.loads(encoded_object,object_hook=dict_to_object)

print myobj_instance

输出：

MODULE: <module 'jsontest' from 'E:\Users\liuzhijun\workspace\python\jsontest.py'>

CLASS <class 'jsontest.MyObj'>

INSTANCE ARGS: {'s': u'helloworld'}

[<MyObj(helloworld)>]

MODULE: <module 'jsontest' from 'E:\Users\liuzhijun\workspace\python\jsontest.py'>

CLASS <class 'jsontest.MyObj'>

INSTANCE ARGS: {'s': u'helloworld'}

[<MyObj(helloworld)>]

使⽤Encoder与Decoder类实现json编码的转换

JSONEncoder有⼀个迭代接⼝iterencode(data)，返回⼀系列编码的数据，他的好处是可以⽅便的把逐个数据写到⽂件或⽹络流中，⽽不需要⼀次性就把数据读⼊内存.

import json

encoder = json.JSONEncoder()

data = [ { 'a':'A', 'b':(2, 4), 'c':3.0 } ]

for part in encoder.iterencode(data):

print 'PART:', part

输出：python中decimal函数用法

PART: [

PART: {

PART: "a"

PART: :

PART: "A"

PART: ,

PART: "c"递归算法求解迷宫问题

PART: :

PART: 3.0

PART: ,

PART: "b"

PART: :

PART: [2

PART: , 4

PART: ]

PART: }

PART: ]

encode⽅法等价于''.join(encoder.iterencode()，⽽且预先会做些错误检查（⽐如⾮字符串作为dict的key），对于⾃定义的对象，我们只需从些JSONEncoder 的default()⽅法，其实现⽅式与上⾯提及的函数convet_to_builtin_type()是类似的。

import json

import json_myobj

class MyObj(object):

def __init__(self,s):

self.s = s

def __repr__(self):

return "<MyObj(%s)>" % self.s

class MyEncoder(json.JSONEncoder):

def default(self, obj):

print 'default(', repr(obj), ')'

# Convert objects to a dictionary of their representation

d = { '__class__':obj.__class__.__name__,

'__module__':obj.__module__,

}

d.update(obj.__dict__)

return d

obj = json_myobj.MyObj('helloworld')

print obj

print MyEncoder().encode(obj)

输出：

<MyObj(internal data)>

default( <MyObj(internal data)> )

{"s": "helloworld", "__module__": "Myobj", "__class__": "MyObj"}

从json对Python对象的转换:

class MyDecoder(json.JSONDecoder):

def __init__(self):

json.JSONDecoder.__init__(self, object_hook=self.dict_to_object)

def dict_to_object(self, d):

if '__class__' in d:

class_name = d.pop('__class__')

河内塔问题有四种策略module_name = d.pop('__module__')

module = __import__(module_name)

print 'MODULE:', moduleanimate日本

class_ = getattr(module, class_name)

print 'CLASS:', class_

args = dict( (de('ascii'), value) for key, value in d.items())

print 'INSTANCE ARGS:', args

inst = class_(**args)

else:

inst = d

return inst

encoded_object = '[{"s": "helloworld", "__module__": "jsontest", "__class__": "MyObj"}]'

myobj_instance = MyDecoder().decode(encoded_object)

print myobj_instance

输出:

MODULE: <module 'jsontest' from 'E:\Users\liuzhijun\workspace\python\jsontest.py'>

CLASS: <class 'jsontest.MyObj'>

INSTANCE ARGS: {'s': u'helloworld'}

[<MyObj(helloworld)>]

json格式字符串写⼊到⽂件流中

上⾯的例⼦都是在内存中操作的，如果对于⼤数据，把他编码到⼀个类⽂件(file-like)中更合适，load()和dump()⽅法就可以实现这样的功能。import json

import tempfile

data = [ { 'a':'A', 'b':(2, 4), 'c':3.0 } ]

f = tempfile.NamedTemporaryFile(mode='w+')

json.dump(data, f)

f.flush()

print open(f.name, 'r').read()

输出：

[{"a": "A", "c": 3.0, "b": [2, 4]}]

类似的：

import json

import tempfile

f = tempfile.NamedTemporaryFile(mode='w+')

f.write('[{"a": "A", "c": 3.0, "b": [2, 4]}]')

f.flush()

f.seek(0)

print json.load(f)

输出：

[{u'a': u'A', u'c': 3.0, u'b': [2, 4]}]

688IT编程网

Python处理JSON(转)

发表评论

推荐文章

随机森林算法介绍及R语言实现

基于随机森林优化的神经网络算法在冬小麦产量预测中的应用研究_百度文 ...

基于正则化贪心森林算法的情感分析方法研究

随机森林算法和grandientboosting算法

基于随机森林的图像分类算法研究

热门文章

随机森林特征选择原理

自动驾驶系统中的随机森林算法解析

随机森林算法及其在生物信息学中的应用

监督学习中的随机森林算法解析(六)

随机森林算法在数据分析中的应用

机器学习——随机森林,RandomForestClassifier参数含义详解

随机森林的算法

随机森林算法作用

监督学习中的随机森林算法解析(十)

随机森林算法案例

随机森林案例

二分类问题常用的模型

绘制ssd框架训练流程

一种基于信息熵和DTW的多维时间序列相似性度量算法

SVM训练过程范文

如何使用支持向量机进行股票预测与交易分析

二分类交叉熵损失函数binary

tinybert_训练中文文本分类模型_概述说明

基于门控可形变卷积和分层Transformer的图像修复模型及其应用

人工智能开发技术的测试和评估方法

最新文章

基于随机森林的数据分类算法改进

人工智能中的智能识别与分类技术

基于人工智能技术的随机森林算法在医疗数据挖掘中的应用

随机森林回归模型的建模步骤

r语言随机森林预测模型校准曲线

《2024年随机森林算法优化研究》范文

标签列表

688IT编程网

Python处理JSON(转)

发表评论

推荐文章

随机森林算法介绍及R语言实现

基于随机森林优化的神经网络算法在冬小麦产量预测中的应用研究_百度文 ...

基于正则化贪心森林算法的情感分析方法研究

随机森林算法和grandientboosting算法

基于随机森林的图像分类算法研究

热门文章

随机森林特征选择原理

自动驾驶系统中的随机森林算法解析

随机森林算法及其在生物信息学中的应用

监督学习中的随机森林算法解析(六)

随机森林算法在数据分析中的应用

机器学习——随机森林,RandomForestClassifier参数含义详解

随机森林 的算法

随机森林算法作用

监督学习中的随机森林算法解析(十)

随机森林算法案例

随机森林案例

二分类问题常用的模型

绘制ssd框架训练流程

一种基于信息熵和DTW的多维时间序列相似性度量算法

SVM训练过程范文

如何使用支持向量机进行股票预测与交易分析

二分类交叉熵损失函数binary

tinybert_训练中文文本分类模型_概述说明

基于门控可形变卷积和分层Transformer的图像修复模型及其应用

人工智能开发技术的测试和评估方法

最新文章

基于随机森林的数据分类算法改进

人工智能中的智能识别与分类技术

基于人工智能技术的随机森林算法在医疗数据挖掘中的应用

随机森林回归模型的建模步骤

r语言随机森林预测模型校准曲线

《2024年随机森林算法优化研究》范文

标签列表

随机森林的算法