url编码和utf8区别「带BOM的UTF-8」和「⽆BOM的UTF-8」有什么区别?
UTF-8 不需要 BOM,尽管 Unicode 标准允许在 UTF-8 中使⽤ BOM。
所以不含 BOM 的 UTF-8 才是标准形式,在 UTF-8 ⽂件中放置 BOM 主要是微软的习惯(顺便提⼀下:把带有 BOM 的⼩端序 UTF-16 称作「Unicode」⽽⼜不详细说明,这也是微软的习惯)。
BOM(byte order mark)是为 UTF-16 和 UTF-32 准备的,⽤于标记字节序(byte order)。微软在 UTF-8 中使⽤ BOM 是因为这样可以把 UTF-8 和 ASCII 等编码明确区分开,但这样的⽂件在 Windows 之外的操作系统⾥会带来问题。
「UTF-8」和「带 BOM 的 UTF-8」的区别就是有没有 BOM。即⽂件开头有没有 U+FEFF。
UTF-8 的⽹页代码不应使⽤ BOM,否则常常会出错。这是⼀个⼩例⼦:为什么这个⽹页代码 <head> 内的信息会被浏览器理解为在 <body>内?
另附《The Unicode Standard, Version 6.0》之 3.10 D95 UTF-8 encoding scheme 的⼀段话:
While there is obviously no need for a byte order signature when using UTF-8, there are occasions when processes convert UTF-16 or UTF-32 data containing a byte order mark into UTF-8. When repre
sented in UTF-8, the byte order mark turns into the byte sequence. Its usage at the beginning of a UTF-8 data stream is neither required nor recommended by the Unicode Standard, but its presence does not affect conformance to the UTF-8 encoding scheme. Identification of the byte sequence at the beginning of a data stream can, however, be taken as a near-certain indication that the data stream is using the UTF-8 encoding scheme.
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论