mysql不分大小写_mysql不区分大小写技术原理是什么?--688IT编程网

mysql不分⼤⼩写_mysql不区分⼤⼩写技术原理是什么？这是MySQL 8.0的官⽅⽂档中，关于字符集和校对(Collation)部分的描述。A character set is a set of symbols and encodings. A collation is a set of rules for comparing characters in a character set. Let's make the distinction clear with an example of an imaginary character set.

字符集是⼀个符号和编码的集合，校对是⼀组如何⽐较字符的规则。下⾯我们通过⼀个想象出来的字符集的样例，来明确它们之间的差别。

Suppose that we have an alphabet with four letters: A, B, a, b. We give each letter a number: A = 0, B = 1, a = 2, b = 3. The letter A is a symbol, the number 0 is the encoding for A, and the combination of all four letters and their encodings is a character set.

假定我们有⼀个只包含4个字母的字母表，分别是A, B, a, b。我们给定每个字母⼀个对应的号码，A = 0, B = 1, a = 2, b = 3。字母A是⼀个符号，号码0是它的编码，这4个字母及它们的编码的组合即是⼀个字符集。

Suppose that we want to compare two string values, A and B. The simplest way to do this is to look at the encodings: 0 for A and 1 for B. Because 0 is less than 1, we say A is less than B. What we've just done is apply a collation to our character set. The collation is a set of rules (only one rule in this case):

“compare the encodings.” We call this simplest of all possible collations a binary collation.

假定我们想⽐较两个字符串的值，A 和 B，最简单的办法就是查看它们的编码，A是0，B是1.因为0⼩于1，所以我们说A ⼩于 B。我们刚刚的操作就是对我们的字符集做了⼀次校对(Collation)。校对(Collation)就是⼀组规则(刚才的情况下只有⼀个规则)：⽐较编码。我们把这种最简单的校对(Collation)称为Binary Collation。

But what if we want to say that the lowercase and uppercase letters are equivalent? Then we would have at least two rules: (1) treat the lowercase letters a and b as equivalent to A and B; (2) then compare the encodings. We call this a case-insensitive collation. It is a little more complex than a binary collation.

mysql是什么系统但是如果我们想让⼤⼩写的字母相等呢？则我们需要⾄少有两个规则：(1)将⼩写的字母a 和 b 看待成与A 和 B ⼀样；(2)⽐较相应的编码。我们将其称之为⼤⼩写不敏感校对(case-insensitive collation)，它⽐binary collation要稍微复杂⼀些。

In real life, most character sets have many characters: not just A and B but whole alphabets, sometimes multiple alphabets or eastern writing systems with thousands of characters, along with many special symbols and punctuation marks. Also in real life, most collations have many rules, not j

ust for whether to distinguish lettercase, but also for whether to distinguish accents (an “accent” is a mark attached to a character as in German Ö), and for multiple-character mappings (such as the rule that Ö = OE in one of the two German collations).

在真实世界中，绝⼤多数字符串都包含有很多字母，不只是A和B，⽽是包含⼀整个字母表，有些时候会包含多套字母表，甚⾄官⽅书写系统中会包含数千个字母(典型的就是我们中⽂吧)，还有很多特殊符号和标点符号。另外，⼤多数校对规则中会有多套规则，不只是说如何区分⼤⼩写，还会有声调(是字母之上的符号，如德语字母Ö)，和多字符映射的规则(如两种德语collations中Ö 等于 OE)

MySQL can do these things for you:

- Store strings using a variety of character sets.

- Compare strings using a variety of collations.

- Mix strings with different character sets or collations in the same server, the same database, or even the same table.

- Enable specification of character set and collation at any level.

MySQL可以做如下的事：

- 使⽤⾮常多种字符集，来存储字符串

- 使⽤⾮常多种校对(Collation)来⽐较字符串

- 在同⼀个服务器，同⼀个数据库甚⾄是同⼀张表中混合存储不同字符集和校对(Collation)的字符串

- 允许在任意等级中指定不同的字符集和校对(Collation)

To use these features effectively, you must know what character sets and collations are available, how to change the defaults, and how they affect the behavior of string operators and functions.

要想有效的使⽤这些特性，你必须知晓那些字符集和校对(Collation)是可⽤的，如何改变默认设定，它们如何影响字符串操作和函数计算。

这是MySQL 8.0的官⽅⽂档中，关于校对(Collation)命名约定部分的描述MySQL collation names follow these conventions:

MySQL校对(Collation)名称遵循如下约定：

- A collation name starts with the name of the character set with which it is associated, generally follo

wed by one or more suffixes indicating other collation characteristics. For example,utf8mb4_general_ciandlatin1_swedish_ciare collations for theutf8mb4andlatin1character sets, respectively. Thebinarycharacter set has a single collation, also namedbinary, with no suffixes.

- 校对(Collation)名称以与其相关的字符集开头，⼀般情况下跟着⼀个或多个校对(Collation)简写后缀。例如，utf8mb4_general_ci 和latin1_swedish_ci分别是utf8mbr和latin1字符集的校对(Collation)。binary字符串只会有⼀个校对(Collation)，同样叫做binary，没有后缀。

- A language-specific collation includes a locale code or language name. For

example,utf8mb4_tr_0900_ai_ciandutf8mb4_hu_0900_ai_cisort characters for theutf8mb4character set using the rules of Turkish and Hungarian, respectively.utf8mb4_turkish_ciandutf8mb4_hungarian_ciare similar but based on a less recent version of the Unicode Collation Algorithm.

- ⼀个特定语⾔的校对(Collation)中会包含地区编码或语⾔名称，如utf8mb4_tr_0900_ai_ci和utf8mb4_hu_0900_ai_ci 分别是

utf8mb4字符集的⼟⽿其和匈⽛利语⾔的校对(Collation)。

r中cbind函数 Collation suffixes indicate whether a collation is case-sensitive, accent-sensitive, or kana-sensitive (or some combination thereof), or binary. The following table shows the suffixes used to indicate these characteristics.

getresource适配字号

- 校对(Collation)后缀简称会是case-sensitive, accent-sensitive, 或是kana-sensitive(或是同时都有)，或是binary。下表是缩写对应表，For nonbinary collation names that do not specify accent sensitivity, it is determined by case sensitivity. If a collation name does not contain _ai or _as, _ci in the name implies _ai and _cs in the name implies _as. For example, latin1_general_ci is explicitly case-insensitive and implicitly accent-insensitive, latin1_general_cs is explicitly case-sensitive and implicitly accent-sensitive, and utf8mb4_0900_ai_ci is explicitly case-insensitive and accent-insensitive.

在没有指定声调敏感性的⾮⼆进制校对(Collation)中，是否声调敏感由⼤⼩写敏感性决定。如果校对(Collation)名称中没有包含_ai或_as，则_ci意味着_ai，_cs意味着_as。如latin1_general_ci明确的指定⼤⼩写不敏感并隐含指定声调不敏感，latin1_general_cs则明确的指定⼤⼩写敏感并隐含指定声调敏感，utf8mb4_0900_ai_ci则是都明确的表⽰⼤⼩写不敏感和声调不敏感。

For Japanese collations, the _ks suffix indicates that a collation is kana-sensitive; that is, it distinguishes Katakana

characters from Hiragana characters. Japanese collations without the _ks suffix are not kana-sensitive and treat Katakana and Hiragana characters equal for sorting.

这段是⽇⽂的⼀段描述，我们基本上⽤不到。

For the binary collation of the binary character set, comparisons are based on numeric byte values. For the _bin collation of a nonbinary character set, comparisons are based on numeric character code values, which differ from byte values for multibyte characters. For information about the differences between the binary collation of the binary character set and the _bin collations of nonbinary character sets, see Section 10.8.5, “The binary Collation Compared to _bin Collations”.

对于binary字符集的binary校对(Collation)，对⽐是基于字节的数字值。对于⾮⼆进制字符集使⽤_bin校对(Collation)的情况，对⽐是基于字母表的数字值，区分在于字节值和多字节字母。

通过上⾯这两部分的内容，你应该知道校对(Collation)中要做的事情还是⾮常多的，不只是字母⼤⼩写的对⽐，做这些对⽐确实是⽐较⿇烦，要有更多的开销。

只不过这都是需要⾯对的现实，要实现的需求，想要节省开销的前提是先把需求满⾜了，然后再来说怎么优化。

补充⼀点东西accent只会在某些特定的字符集⾥才会出现

通过SHOW CHARACTER SET；和SHOW COLLATION WHERE Charset = 'utf8mb4'；命令可以查看当前数据库⽀持的字符集，和指定字符集⽀持的Collation列表。

实例化内部类

通过下⾯的输出，其实可以看到更多的Collation细节，它包含排序长度和Pad与否的信息

mysql> SHOW COLLATION WHERE Charset = 'utf8mb4';

运算符号有哪些+----------------------------+---------+-----+---------+----------+---------+---------------+

+----------------------------+---------+-----+---------+----------+---------+---------------+

| utf8mb4_0900_ai_ci | utf8mb4 | 255 | Yes | Yes | 0 | NO PAD |

| utf8mb4_0900_as_ci | utf8mb4 | 305 | | Yes | 0 | NO PAD |

| utf8mb4_0900_as_cs | utf8mb4 | 278 | | Yes | 0 | NO PAD |

| utf8mb4_0900_bin | utf8mb4 | 309 | | Yes | 1 | NO PAD |sql server msdtc不可用

| utf8mb4_bin | utf8mb4 | 46 | | Yes | 1 | PAD SPACE |

| utf8mb4_croatian_ci | utf8mb4 | 245 | | Yes | 8 | PAD SPACE |

| utf8mb4_cs_0900_ai_ci | utf8mb4 | 266 | | Yes | 0 | NO PAD |

| utf8mb4_cs_0900_as_cs | utf8mb4 | 289 | | Yes | 0 | NO PAD |

| utf8mb4_czech_ci | utf8mb4 | 234 | | Yes | 8 | PAD SPACE |

| utf8mb4_danish_ci | utf8mb4 | 235 | | Yes | 8 | PAD SPACE |

| utf8mb4_da_0900_ai_ci | utf8mb4 | 267 | | Yes | 0 | NO PAD |

| utf8mb4_da_0900_as_cs | utf8mb4 | 290 | | Yes | 0 | NO PAD |

| utf8mb4_de_pb_0900_ai_ci | utf8mb4 | 256 | | Yes | 0 | NO PAD |

| utf8mb4_de_pb_0900_as_cs | utf8mb4 | 279 | | Yes | 0 | NO PAD |

| utf8mb4_eo_0900_ai_ci | utf8mb4 | 273 | | Yes | 0 | NO PAD |

| utf8mb4_eo_0900_as_cs | utf8mb4 | 296 | | Yes | 0 | NO PAD |

...

4. MySQL 8.0后使⽤的默认字符集是utf8mb4，默认Collation是utf8mb4_0900_ai_ci，就是4字节utf8，Unicode V9，声调不敏感，⼤⼩写不敏感。如果改成敏感，你就会看到不⼀样的执⾏结果。

mysql> set names 'utf8mb4' collate 'utf8mb4_0900_ai_ci';

Query OK, 0 rows affected (0.00 sec)

mysql> select 'ā' = 'á';

+-------------+

| 'ā' = 'á' |

+-------------+

| 1 |

+-------------+

1 row in set (0.00 sec)

688IT编程网

mysql不分大小写_mysql不区分大小写技术原理是什么?

发表评论

推荐文章

java正则表达式选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

热门文章

excel文字递增函数公式

数字递增公式

notepad 正则变量运算

C++regex库常用函数及实例

js正则表达式之前瞻后顾与非捕获分组

indesign正则数字和英文之间的空格

C#匹配中文字符串的4种正则表达式分享

PHP正则表达式匹配中文字符

匹配中文汉字的正则表达式介绍

Python正则表达式如何进行字符串替换

orcl中用正则表达式

sql正则表达式excel

dataframe正则表达式

postgress sql正则

el-upload accept 正则表达式

半小时正则表达式

判断科学计数法的正则

根据url判断静态资源的方法

Java正则表达式-匹配正负浮点数

替换模糊匹配正则-hive

最新文章

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

能被5整除的十进制整数的正规表达式

大于0小于等于1的正则表达式

linux grep 26个字母

java pattern 正则表达式

掌握文本编辑器中的搜索和替换技巧

标签列表

688IT编程网

mysql不分大小写_mysql不区分大小写技术原理是什么?

发表评论

推荐文章

java正则表达式 选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

热门文章

excel文字递增函数公式

数字递增公式

notepad 正则变量运算

C++regex库常用函数及实例

js正则表达式之前瞻后顾与非捕获分组

indesign正则数字和英文之间的空格

C#匹配中文字符串的4种正则表达式分享

PHP正则表达式匹配中文字符

匹配中文汉字的正则表达式介绍

Python正则表达式如何进行字符串替换

orcl中用正则表达式

sql正则表达式excel

dataframe正则表达式

postgress sql正则

el-upload accept 正则表达式

半小时 正则表达式

判断科学计数法的正则

根据url判断静态资源的方法

Java正则表达式-匹配正负浮点数

替换模糊匹配正则-hive

最新文章

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

能被5整除的十进制整数的正规表达式

大于0小于等于1的正则表达式

linux grep 26个字母

java pattern 正则表达式

掌握文本编辑器中的搜索和替换技巧

标签列表

java正则表达式选择题

非零金额正则表达式

半小时正则表达式