Python格式化xml并转为字典
浏览器json格式化⽬录
xml处理
⼀、格式化xml
调⽤,格式化xml字符串
xml_text ='<?xml version="1.0" encoding="ISO-8859-1"?><note><to>George</to><from>John</from><heading>Reminder</heading><body>Do not forget the meeting!</body></note>'
url ="web.chacuo/formatxml"
headers ={
"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36", "Host":"web.chacuo",
"X-Requested-With":"XMLHttpRequest",
"Content-Type":"application/x-www-form-urlencoded; charset=UTF-8",
}
form_data ={"data": xml_text,"type":"format","beforeSend":"undefined"}
resp = requests.post(url, data=form_data, headers=headers, timeout=20)
print(resp.json()['data'][0])
⼆、将xml转为字典(import xmltodict)
xmltodict.parse()⽅法实现对xml字符串转为字典
xmltodict.unparse()⽅法可以将字典转换为xml字符串
import xmltodict
format_ed_xml ='<?xml version="1.0" encoding="ISO-8859-1"?><note><to>George</to><from>John</from><heading>Reminder</heading><body>Do not forget the meeting!</body></note>'
dict_xml = xmltodict.parse(format_ed_xml)
print(dict_xml)
# OrderedDict([('note', OrderedDict([('to', 'George'), ('from', 'John'), ('heading', 'Reminder'), ('body', 'Do not forget the meeting!')]))])
三、 pat.ExpatError: XML or text declaration not at start of entity报错解决⽅法
第⼀种:按步骤⼀的⽅式,先将xml字符串格式化,然后再转字典;
第⼆种:如果按步骤⼀格式化后仍有错误,原因在于⼀些⾮法字符诸如:< , > , &等被xml误认为是标签,但没有到成对的,此时按提⽰多少⾏多少列将这些字符替换即可(可将格式化后的⽂本保存为xml⽂件,⽤Notepad++打开,修改保存会提⽰哪⾏出错,右下⾓有⾏列的显⽰)
第三种:⽤IE浏览器打开步骤xml⽂件,然后复制xml内容再去格式化,应该就可以解决问题,如还有报错,按第⼆种⽅法解决报错
四、完整代码如下
格式化xml>xml转字典>保存为xml⽂件
import requests
import xmltodict
def pretty_xml(text:str)->str:
"""
将未格式化的xml字符串格式化
:param text: 待格式化的xml字符串
:return: 格式化好的字符串
"""
url ="web.chacuo/formatxml"
headers ={
"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36", "Host":"web.chacuo",
"X-Requested-With":"XMLHttpRequest",
"Content-Type":"application/x-www-form-urlencoded; charset=UTF-8",
}
form_data ={"data": text,"type":"format","beforeSend":"undefined"}
resp = requests.post(url, data=form_data, headers=headers, timeout=20)
print(resp.json()['data'][0])
return resp.json()['data'][0]
def save_xml(pretty_xml_str:str):
"""将xml存⼊xml⽂件"""
with open("l","w", encoding="utf-8")as fp:
fp.write(pretty_xml_str)
def xml_to_dict(format_ed_xml:str):
"""将xml转为字典"""
dict_xml = xmltodict.parse(format_ed_xml)
print(f"\n>>>>{dict_xml['note']['body']}")
if __name__ =="__main__":
xml_text ='<?xml version="1.0" encoding="ISO-8859-1"?><note><to>George</to><from>John</from><heading>Reminder</heading><body>Do not forg et the meeting!</body></note>'
format_xml = pretty_xml(xml_text)# 格式化xml
xml_to_dict(format_xml)# 将xml转为字典
save_xml(format_xml)# 存xml⽂件
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论