首页 IT编程正文内容

...中文和uXXXX使用toStyledString生成字符串中文乱码解决方案

IT编程

2025-03-12 19:19:15

字符串乱码函数解析

c++jsoncpp中⽂和uXXXX使⽤toStyledString⽣成字符串中⽂

乱码解决⽅案

⽬录

⼀、中⽂乱码解决⽅法

1.1、乱码展⽰

在使⽤jsoncpp解析含有中⽂的字符串的时候，使⽤toStyledString()函数⽣成的字符串中的中⽂部分将变成\u加4个16进制数字会出现解析乱码的情况。

⽐如：

1.2、乱码原因及解决⽅法

static String valueToQuotedStringN(const char* value, unsigned length) {

if (value == nullptr)

return "";

if (!isAnyCharRequiredQuoting(value, length))

return String("\"") + value + "\"";

// We have to walk value and escape any special characters.

// Appending to String is not efficient, but this should be rare.

// (Note: forward slashes are *not* rare, but I am not escaping them.)

String::size_type maxsize = length * 2 + 3; // allescaped+quotes+NULL

String result;

result += "\"";

char const* end = value + length;

for (const char* c = value; c != end; ++c) {

switch (*c) {

case '\"':

result += "\\\"";

break;

case '\\':

result += "\\\\";

break;

case '\b':

result += "\\b";

break;

case '\f':

result += "\\f";

break;

case '\n':

case '\n':

result += "\\n";

break;

case '\r':

result += "\\r";

break;

case '\t':

result += "\\t";

break;

// case '/':

// Even though \/ is considered a legal escape in JSON, a bare

// slash is also legal, so I see no reason to escape it.

// (I hope I am not misunderstanding something.)

// blep notes: actually escaping \/ may be useful in javascript to avoid </

// sequence.

/

/ Should add a flag to allow this compatibility mode and prevent this

// sequence from occurring.

default: {

unsigned int cp = utf8ToCodepoint(c, end);

// don't escape non-control characters

// (short escape sequence are applied above)

if (cp < 0x80 && cp >= 0x20)

result += static_cast<char>(cp);

else if (cp < 0x10000) { // codepoint is in Basic Multilingual Plane

result += "\\u";

result += toHex16Bit(cp);

}

else { // codepoint is not in Basic Multilingual Plane

// convert to surrogate pair first

cp -= 0x10000;

result += "\\u";

result += toHex16Bit((cp >> 10) + 0xD800);

result += "\\u";

result += toHex16Bit((cp & 0x3FF) + 0xDC00);

}

}break;

}

}

result += "\"";

return result;

}

通过代码可以明⽩的看到default:⾥⾯处理的就是包括中⽂在内的字符：于是我们可以修改源代码重新编译库。将：

default: {

unsigned int cp = utf8ToCodepoint(c, end);

// don't escape non-control characters

// (short escape sequence are applied above)

if (cp < 0x80 && cp >= 0x20)

result += static_cast<char>(cp);

else if (cp < 0x10000) { // codepoint is in Basic Multilingual Plane result += "\\u";

result += toHex16Bit(cp);

}

else { // codepoint is not in Basic Multilingual Plane

// convert to surrogate pair first

cp -= 0x10000;

result += "\\u";

中文字符unicode查询

result += toHex16Bit((cp >> 10) + 0xD800);

result += "\\u";

result += toHex16Bit((cp & 0x3FF) + 0xDC00);

}

/

/result += *c;

}break;

改为：

default: {

result += *c;

}break;

最终结果为：

参考链接：

⼆、含有\uXXXX解析乱码的解决⽅法

2.1、乱码展⽰

json⽂件如下：

解析结果：

2.2、乱码原因

之前改过valueToQuotedStringN函数，这个函数是将字符串转化为unicode编码，所以直接读取\uXXXX格式的字符串得到的其实是utf-8的字符串（如果读的是中⽂才是unicode编码）。所以这⾥需要额外的将字符串转化为unicode代码

2.3、解决⽅法

utf-8转unicode:

wstring UTF8ToUnicode(const string& str)

{

int len = 0;

len = str.length();

int unicodeLen = ::MultiByteToWideChar(CP_UTF8,

0,

str.c_str(),

-1,

NULL,

0);

wchar_t * pUnicode;

pUnicode = new wchar_t[unicodeLen + 1];

memset(pUnicode, 0, (unicodeLen + 1) * sizeof(wchar_t));

:

:MultiByteToWideChar(CP_UTF8,

0,

str.c_str(),

-1,

(LPWSTR)pUnicode,

unicodeLen);

wstring rt;

rt = (wchar_t*)pUnicode;

delete pUnicode;

return rt;

}

在程序中加⼊该函数，并调⽤：

std::string ws2s(const std::wstring& ws)

{

std::string curLocale = setlocale(LC_ALL, NULL);

setlocale(LC_ALL, "chs");

const wchar_t* _Source = ws.c_str();

size_t _Dsize = 2 * ws.size() + 1;

char *_Dest = new char[_Dsize];

memset(_Dest, 0, _Dsize);

wcstombs(_Dest, _Source, _Dsize);

std::string result = _Dest;

delete[]_Dest;

setlocale(LC_ALL, curLocale.c_str());

return result;

}

//调⽤

std::string content = root["Cnki"][i]["content"].toStyledString(); wstring wstr = UTF8ToUnicode(content);//将utf-8转化为unicode格式cout << ws2s(wstr) << endl;

结果：

参考链接：

版权声明：本站内容均来自互联网，仅供演示用，请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198，我们将在24小时内删除。

python3中文编码

« 上一篇

java判断字符个数_使用Java判断字符串中的中文字符数量

下一篇 »

发表评论

推荐文章

java正则表达式选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

热门文章

最新文章

标签列表