Difference between revisions of "Unicode"
m (spelling) |
m (PFC link to category) |
||
(2 intermediate revisions by the same user not shown) | |||
Line 11: | Line 11: | ||
A byte-order mark (BOM) is the Unicode character with hex value 0xFEFF. | A byte-order mark (BOM) is the Unicode character with hex value 0xFEFF. | ||
It is optionally written in the start of Unicode files to clarify the byte order of the file. | It is optionally written in the start of Unicode '''files''' to clarify the byte order of the file. | ||
{|{{prettytable}} | {|{{prettytable}} | ||
Line 26: | Line 26: | ||
It is also optionally written in UTF-8 files to indicate that UTF-8 format is used. In UTF-8 the values are <tt>EF BB BF</tt>. It can also be used in other Unicode transfer. | It is also optionally written in UTF-8 files to indicate that UTF-8 format is used. In UTF-8 the values are <tt>EF BB BF</tt>. It can also be used in other Unicode transfer. | ||
=== Unicode support in Visual Prolog === | |||
In Visual Prolog the data type <vp>string</vp> represent Unicode strings. 8-bit character strings are represented by the data type <vp>string8</vp>. [[:Category:PFC|PFC]] by default targets Unicode string support, but some support is given for 8-bit character set support. Files can for example be read and written in 8-bit character sets. | |||
UTF-8 is represented as <vp>string8</vp> strings. | |||
Unicode files created using PFC will get a byte order mark. | |||
<vip>S = outputStream_file::create("file.text"), % create a Unicode file that starts with a BOM</vip> | |||
To create a Unicode file without a byte order mark, create the file as an 8-bit file and then change the stream to Unicode format afterwards: | |||
<vip>S = outputStream_file::create8("file.text"), | |||
S:setMode(stream::unicode()), % create a Unicode file that starts without a BOM<</vip> | |||
=== References === | === References === | ||
Line 33: | Line 48: | ||
* [[wikipedia:UTF-16]] | * [[wikipedia:UTF-16]] | ||
* [[wikipedia:Byte-order mark]] | * [[wikipedia:Byte-order mark]] | ||
[[Category:PFC]] |
Latest revision as of 22:15, 29 October 2008
Unicode is an industry standard allowing computers to consistently represent and manipulate text expressed in most of the world's writing systems.
Encoding
Unicode can be encoded in several ways. In Windows (and thus Visual Prolog) the most interesting formats are UTF-8 and UTF-16 (Little Endian). UTF is an acronym for Unicode Transfer Format. UTF-8 is an 8-bit character set encoding where standard ANSII characters are kept as they are and more rare (in western languages) characters are encoded using two, three or four byte character sequences.
In UTF-16 Little Endian Unicode is encoded using sequences of 1 or 2 16-bit units. UTF-16 is used internally in Windows API calls. Little Endian means that the least significant byte of each 16-bit unit is stored before the most significant byte.
Byte-order mark
A byte-order mark (BOM) is the Unicode character with hex value 0xFEFF.
It is optionally written in the start of Unicode files to clarify the byte order of the file.
Encoding | Representation (hexadecimal) |
---|---|
UTF-16 Big Endian | FE FF |
UTF-16 Little Endian | FF FE |
It is also optionally written in UTF-8 files to indicate that UTF-8 format is used. In UTF-8 the values are EF BB BF. It can also be used in other Unicode transfer.
Unicode support in Visual Prolog
In Visual Prolog the data type string represent Unicode strings. 8-bit character strings are represented by the data type string8. PFC by default targets Unicode string support, but some support is given for 8-bit character set support. Files can for example be read and written in 8-bit character sets.
UTF-8 is represented as string8 strings.
Unicode files created using PFC will get a byte order mark.
S = outputStream_file::create("file.text"), % create a Unicode file that starts with a BOM
To create a Unicode file without a byte order mark, create the file as an 8-bit file and then change the stream to Unicode format afterwards:
S = outputStream_file::create8("file.text"), S:setMode(stream::unicode()), % create a Unicode file that starts without a BOM<