c++jsoncpp中⽂和uXXXX使⽤toStyledString⽣成字符串中⽂
乱码解决⽅案
⽬录
⼀、中⽂乱码解决⽅法
1.1、乱码展⽰
在使⽤jsoncpp解析含有中⽂的字符串的时候,使⽤toStyledString()函数⽣成的字符串中的中⽂部分将变成\u加4个16进制数字会出现解析乱码的情况。
⽐如:
1.2、乱码原因及解决⽅法
static String valueToQuotedStringN(const char* value, unsigned length) {
if (value == nullptr)
return "";
if (!isAnyCharRequiredQuoting(value, length))
return String("\"") + value + "\"";
// We have to walk value and escape any special characters.
// Appending to String is not efficient, but this should be rare.
// (Note: forward slashes are *not* rare, but I am not escaping them.)
String::size_type maxsize = length * 2 + 3; // allescaped+quotes+NULL
String result;
result += "\"";
char const* end = value + length;
for (const char* c = value; c != end; ++c) {
switch (*c) {
case '\"':
result += "\\\"";
break;
case '\\':
result += "\\\\";
break;
case '\b':
result += "\\b";
break;
case '\f':
result += "\\f";
break;
case '\n':
case '\n':
result += "\\n";
break;
case '\r':
result += "\\r";
break;
case '\t':
result += "\\t";
break;
// case '/':
// Even though \/ is considered a legal escape in JSON, a bare
// slash is also legal, so I see no reason to escape it.
// (I hope I am not misunderstanding something.)
// blep notes: actually escaping \/ may be useful in javascript to avoid </
// sequence.
/
/ Should add a flag to allow this compatibility mode and prevent this
// sequence from occurring.
default: {
unsigned int cp = utf8ToCodepoint(c, end);
// don't escape non-control characters
// (short escape sequence are applied above)
if (cp < 0x80 && cp >= 0x20)
result += static_cast<char>(cp);
else if (cp < 0x10000) { // codepoint is in Basic Multilingual Plane
result += "\\u";
result += toHex16Bit(cp);
}
else { // codepoint is not in Basic Multilingual Plane
// convert to surrogate pair first
cp -= 0x10000;
result += "\\u";
result += toHex16Bit((cp >> 10) + 0xD800);
result += "\\u";
result += toHex16Bit((cp & 0x3FF) + 0xDC00);
}
}break;
}
}
result += "\"";
return result;
}
通过代码可以明⽩的看到default:⾥⾯处理的就是包括中⽂在内的字符:于是我们可以修改源代码重新编译库。将:
default: {
unsigned int cp = utf8ToCodepoint(c, end);
// don't escape non-control characters
// (short escape sequence are applied above)
if (cp < 0x80 && cp >= 0x20)
result += static_cast<char>(cp);
else if (cp < 0x10000) { // codepoint is in Basic Multilingual Plane result += "\\u";
result += toHex16Bit(cp);
}
else { // codepoint is not in Basic Multilingual Plane
// convert to surrogate pair first
cp -= 0x10000;
result += "\\u";
中文字符unicode查询result += toHex16Bit((cp >> 10) + 0xD800);
result += "\\u";
result += toHex16Bit((cp & 0x3FF) + 0xDC00);
}
/
/result += *c;
}break;
改为:
default: {
result += *c;
}break;
最终结果为:
参考链接:
⼆、含有\uXXXX解析乱码的解决⽅法
2.1、乱码展⽰
json⽂件如下:
解析结果:
2.2、乱码原因
之前改过valueToQuotedStringN函数,这个函数是将字符串转化为unicode编码,所以直接读取\uXXXX格式的字符串得到的其实是utf-8的字符串(如果读的是中⽂才是unicode编码)。所以这⾥需要额外的将字符串转化为unicode代码
2.3、解决⽅法
utf-8转unicode:
wstring UTF8ToUnicode(const string& str)
{
int len = 0;
len = str.length();
int unicodeLen = ::MultiByteToWideChar(CP_UTF8,
0,
str.c_str(),
-1,
NULL,
0);
wchar_t * pUnicode;
pUnicode = new wchar_t[unicodeLen + 1];
memset(pUnicode, 0, (unicodeLen + 1) * sizeof(wchar_t));
:
:MultiByteToWideChar(CP_UTF8,
0,
str.c_str(),
-1,
(LPWSTR)pUnicode,
unicodeLen);
wstring rt;
rt = (wchar_t*)pUnicode;
delete pUnicode;
return rt;
}
在程序中加⼊该函数,并调⽤:
std::string ws2s(const std::wstring& ws)
{
std::string curLocale = setlocale(LC_ALL, NULL);
setlocale(LC_ALL, "chs");
const wchar_t* _Source = ws.c_str();
size_t _Dsize = 2 * ws.size() + 1;
char *_Dest = new char[_Dsize];
memset(_Dest, 0, _Dsize);
wcstombs(_Dest, _Source, _Dsize);
std::string result = _Dest;
delete[]_Dest;
setlocale(LC_ALL, curLocale.c_str());
return result;
}
//调⽤
std::string content = root["Cnki"][i]["content"].toStyledString(); wstring wstr = UTF8ToUnicode(content);//将utf-8转化为unicode格式cout << ws2s(wstr) << endl;
结果:
参考链接:
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论