MySQL字符集utf8、utf8mb3、utf8mb4--688IT编程网

MySQL字符集utf8、utf8mb3、utf8mb4⾸先想要了解MySQL的字符集，就需要去官⽅⽂档看看字符集是如何介绍的。英语不错的话，看官⽅⽂档应该是没问题。在搜索框⾥搜⼀下就可以到相关的解释。我就在这⾥整理⼀下，以便后期查看。字符集在官⽅⽂档下⾯这⼀章节：

Chapter 10 Character Sets, Collations, Unicode

⼀、字符集设置

MySQL数据库可以做到：

1、使⽤多种字符集存储字符串。

2、使⽤多种排序规则⽐较字符串。

3、在同⼀服务器、同⼀数据库、甚⾄同⼀表中混合具有不同字符集或排序规则的字符串。

4、在任何级别启⽤字符集和排序规则的规范。

MySQL可以设置如下40种字符：

mysql> SHOW CHARACTER SET;

+----------+-----------------------------+---------------------+--------+

+----------+-----------------------------+---------------------+--------+

| big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 |

| dec8 | DEC West European | dec8_swedish_ci | 1 |

| cp850 | DOS West European | cp850_general_ci | 1 |

| hp8 | HP West European | hp8_english_ci | 1 |

| swe7 | 7bit Swedish | swe7_swedish_ci | 1 |

| tis620 | TIS620 Thai | tis620_thai_ci | 1 |

| gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | 2 |

| cp1250 | Windows Central European | cp1250_general_ci | 1 |

| gbk | GBK Simplified Chinese | gbk_chinese_ci | 2 |

| utf8 | UTF-8 Unicode | utf8_general_ci | 3 |

| ucs2 | UCS-2 Unicode | ucs2_general_ci | 2 |

| cp866 | DOS Russian | cp866_general_ci | 1 |

| cp852 | DOS Central European | cp852_general_ci | 1 |

| cp1251 | Windows Cyrillic | cp1251_general_ci | 1 |

| utf16 | UTF-16 Unicode | utf16_general_ci | 4 |

| cp1256 | Windows Arabic | cp1256_general_ci | 1 |

| cp1257 | Windows Baltic | cp1257_general_ci | 1 |

| utf32 | UTF-32 Unicode | utf32_general_ci | 4 |

| cp932 | SJIS for Windows Japanese | cp932_japanese_ci | 2 |

+----------+-----------------------------+---------------------+--------+

40 rows in set (0.00 sec)

String expressions have a repertoire attribute, which can have two values:

ASCII: The expression can contain only characters in the Unicode range U+0000 to U+007F.

UNICODE: The expression can contain characters in the Unicode range U+0000 to U+10FFFF. This includes characters in the Basic Multilingual Plane (BMP) range (U+0000 to U+FFFF) and supplementary characters outside the BMP range (U+10000 to U+10FFFF).

SET NAMES 'utf8';

There are default settings for character sets and collations at four levels: server, database, table, and column.

_ai Accent insensitive 重⾳不敏感

_as Accent sensitive 重⾳敏感

_ci Case insensitive 不区分⼤⼩写

_cs case-sensitive 区分⼤⼩写

_bin Binary ⼆进制

设置了_ci，顾名思义显式不区分⼤⼩写，隐式重⾳不敏感。

设置了_cs，顾名思义_as也是包含的，显式区分⼤⼩写，隐式重⾳敏感。

设置MySQL server character参数如下：

character-set-server

⽅法⼀：

mysqld

mysqld --character-set-server=latin1

mysqld --character-set-server=latin1 \

--collation-server=latin1_swedish_ci

⽅法⼆：

cmake . -DDEFAULT_CHARSET=latin1

或cmake . -DDEFAULT_CHARSET=latin1 \

-DDEFAULT_COLLATION=latin1_german1_ci

The current server character set and collation can be determined from the values of the character_set_server and collation_server system variables. These variables can be changed at runtime.

⼆、Database Character Set and Collation

CREATE DATABASE db_name [[DEFAULT] CHARACTER SET charset_name] [[DEFAULT] COLLATE collation_name]

ALTER DATABASE db_name [[DEFAULT] CHARACTER SET charset_name] [[DEFAULT] COLLATE collation_name]

The keyword SCHEMA can be used instead of DATABASE.

All database options are stored in a text file named db.opt that can be found in the database directory.

The CHARACTER SET and COLLATE clauses make it possible to create databases with different character sets and collations on the same MySQL server.

查看你数据库这两个参数设置：

USE db_name;

SELECT @@character_set_database, @@collation_database;

三、Table Character Set and Collation

The CREATE TABLE and ALTER TABLE statements have optional clauses for specifying the table character set and collation:

CREATE TABLE tbl_name (column_list) [[DEFAULT] CHARACTER SET charset_name] [COLLATE collation_name]]

ALTER TABLE tbl_name [[DEFAULT] CHARACTER SET charset_name] [COLLATE collation_name]

四、Column Character Set and Collation

Every “character” column (that is, a column of type CHAR, VARCHAR, or TEXT) has a column character set and a column collation. Column definition syntax for CREATE TABLE and ALTER TABLE has optional clauses for specifying the column character set and collation:

col_name {CHAR | VARCHAR | TEXT} (col_length) [CHARACTER SET charset_name] [COLLATE collation_name]

col_name {ENUM | SET} (val_list) [CHARACTER SET charset_name] [COLLATE collation_name]

五、Character String Literal Character Set and Collation

For the simple statement SELECT 'string', the string has the connection default character set and collation defined by the character_set_connection and

collation_connection system variables.

A character string literal may have an optional character set introducer and COLLATE clause, to designate it as a string that uses a particular character set and collation:

[_charset_name]'string' [COLLATE collation_name]

Examples:mysql帮助文档

SELECT 'abc';

SELECT _latin1'abc';

SELECT _binary'abc';

SELECT _utf8'abc' COLLATE utf8_danish_ci;

六、The National Character Set

Standard SQL defines NCHAR or NATIONAL CHAR as a way to indicate that a CHAR column should use some predefined character set. MySQL usesutf8 as this predefined character set. For example, these data type declarations are equivalent:

CHAR(10) CHARACTER SET utf8

NATIONAL CHARACTER(10)

NCHAR(10)

As are these:

VARCHAR(10) CHARACTER SET utf8

NATIONAL VARCHAR(10)

NVARCHAR(10)

NCHAR VARCHAR(10)

NATIONAL CHARACTER VARYING(10)

NATIONAL CHAR VARYING(10)

七、Character Set Introducers

A character string literal, hexadecimal literal, or bit-value literal may have an optional character set introducer and COLLATE clause, to designate it as a string that uses a particular character set and collation:

[_charset_name] literal [COLLATE collation_name]

Character set introducers and the COLLATE clause are implemented according to standard SQL specifications.

Examples:

SELECT 'abc';

SELECT _latin1'abc';

SELECT _binary'abc';

SELECT _utf8'abc' COLLATE utf8_danish_ci;

SELECT _latin1 X'4D7953514C'; --16进制

SELECT _utf8 0x4D7953514C COLLATE utf8_danish_ci;

SELECT _latin1 b'1000001'; --2进制

SELECT _utf8 0b1000001 COLLATE utf8_danish_ci;

⼋、Unicode Support

BMP characters have these characteristics:

Supplementary characters lie outside the BMP:

下⾯这张表统计字符集字节数：

Character Set Supported Characters Required Storage Per Character

utf8mb3, utf8BMP only1, 2, or 3 bytes

ucs2BMP only 2 bytes

utf8mb4BMP and supplementary1, 2, 3, or 4 bytes

utf16BMP and supplementary 2 or 4 bytes

utf16le BMP and supplementary 2 or 4 bytes

utf32BMP and supplementary 4 bytes

九、utf8(utf8mb3)与utf8mb4的转换

10.9.8 Converting Between 3-Byte and 4-Byte Unicode Character Sets

The utf8mb3 and utf8mb4 character sets differ as follows:

utf8mb3 supports only characters in the Basic Multilingual Plane (BMP). utf8mb4 additionally supports supplementary characters that lie outside the BMP.

Note

This discussion refers to the utf8mb3 and utf8mb4 character set names to be explicit about referring to 3-byte and 4-byte UTF-8 character set data. The exception is that in table definitions, utf8 is used because MySQL converts instances of utf8mb3specified in such definitions to utf8, which is an alias for utf8mb3.

utf8mb4与utf8(utf8mb3)转换也是特别好转换的：

1.utf8(utf8mb3)转成utf8mb4可以存储supplementary characters；

2.utf8(utf8mb3)转成utf8mb4可能会增加数据存储空间；

3.对于BMP character字符，utf8(utf8mb3)转成utf8mb4相同的代码值、相同的编码、相同的长度，不会有变化。

4.对于supplementary character字符，utf8mb4会以4字节存储，由于utf8mb3⽆法存储supplementary character字符，因⽽在字符集转换过程中，不⽤担⼼字符⽆法转换的问题。

5.表结构在转换过程中需要调整：utf8(utf8mb3)字符集可变长度字符数据类型（VARCHAR和text类型）设定的表中列的字段长度，utf8mb4中将会存储更少的字符。对于所有字符数据类型（CHAR、VARCHAR和⽂本类型），UTF8Mb4列最多可被索引的字符数⽐UTF8Mb3列要少。因此在转换之前，要检查字段类型。防⽌转换后表，索引存储的数据超出该字段定义长度，字段类型长度可以存储的最⼤字节数。innodb索引列：最⼤索引列长度767 bytes，对于utf8mb3就是可以索引255个字符，对于utf8mb4就是可以索引191个字符。在转换后不能满⾜那么就需要换⼀个列来索引。以下是通过压缩⽅式使索引更多的字节

Note：

688IT编程网

MySQL字符集utf8、utf8mb3、utf8mb4

发表评论

推荐文章

java正则表达式选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

热门文章

利用正则表达式实现文本数据提取与处理

正则表达式零宽断言详解

文本匹配规则

excel中使用正则

1-31正则表达式

anki之高级筛选

BUAA_OO_2021_第一单元总结

insert语句递增写法

sublime text 3在行前插入递增数字序号的方法

字符串只允许数字和英文的正则

powerbuilder 正则表达式

Shell脚本编写的高级技巧利用正则表达式进行字符串匹配

JAVA正则表达式的三种模式:贪婪,勉强和占有的讨论

go regexp匹配规则

oracle regexp_substr 实现原理

基本的元字符回溯引用和前后查匹配模式

elasticsearch query dsl正则

oracle sql正则表达式

GA-设置目标

仅匹配全角片假名的正则表达式

最新文章

java正则表达式选择题

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

vue数字相加小数点变长-概述说明以及解释

vue validate 正则验证小数长度

标签列表

688IT编程网

MySQL字符集utf8、utf8mb3、utf8mb4

发表评论

推荐文章

java正则表达式 选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

热门文章

利用正则表达式实现文本数据提取与处理

正则表达式零宽断言详解

文本匹配规则

excel中使用正则

1-31正则表达式

anki之高级筛选

BUAA_OO_2021_第一单元总结

insert语句递增写法

sublime text 3在行前插入递增数字序号的方法

字符串只允许数字和英文的正则

powerbuilder 正则表达式

Shell脚本编写的高级技巧利用正则表达式进行字符串匹配

JAVA正则表达式的三种模式:贪婪,勉强和占有的讨论

go regexp匹配规则

oracle regexp_substr 实现原理

基本的元字符 回溯引用和前后查 匹配模式

elasticsearch query dsl正则

oracle sql正则表达式

GA-设置目标

仅匹配全角片假名的正则表达式

最新文章

java正则表达式 选择题

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

vue数字相加小数点变长-概述说明以及解释

vue validate 正则验证小数长度

标签列表

java正则表达式选择题

非零金额正则表达式

基本的元字符回溯引用和前后查匹配模式

java正则表达式选择题

非零金额正则表达式