Commit | Line | Data |
---|---|---|
a5db3a15 GW |
1 | .\" Copyright (c) 1993 |
2 | .\" The Regents of the University of California. All rights reserved. | |
3 | .\" | |
4 | .\" This code is derived from software contributed to Berkeley by | |
5 | .\" Paul Borman at Krystal Technologies. | |
6 | .\" | |
7 | .\" Redistribution and use in source and binary forms, with or without | |
8 | .\" modification, are permitted provided that the following conditions | |
9 | .\" are met: | |
10 | .\" 1. Redistributions of source code must retain the above copyright | |
11 | .\" notice, this list of conditions and the following disclaimer. | |
12 | .\" 2. Redistributions in binary form must reproduce the above copyright | |
13 | .\" notice, this list of conditions and the following disclaimer in the | |
14 | .\" documentation and/or other materials provided with the distribution. | |
15 | .\" 3. All advertising materials mentioning features or use of this software | |
16 | .\" must display the following acknowledgement: | |
17 | .\" This product includes software developed by the University of | |
18 | .\" California, Berkeley and its contributors. | |
19 | .\" 4. Neither the name of the University nor the names of its contributors | |
20 | .\" may be used to endorse or promote products derived from this software | |
21 | .\" without specific prior written permission. | |
22 | .\" | |
23 | .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND | |
24 | .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE | |
25 | .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE | |
26 | .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE | |
27 | .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | |
28 | .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS | |
29 | .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) | |
30 | .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT | |
31 | .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY | |
32 | .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF | |
33 | .\" SUCH DAMAGE. | |
34 | .\" | |
35 | .\" @(#)utf2.4 8.1 (Berkeley) 6/4/93 | |
36 | .\" | |
37 | .Dd "June 4, 1993" | |
38 | .Dt UTF2 4 | |
39 | .Os | |
40 | .Sh NAME | |
41 | .Nm UTF2 | |
42 | .Nd "Universal character set Transformation Format encoding of runes | |
43 | .Sh SYNOPSIS | |
44 | \fBENCODING "UTF2"\fP | |
45 | .Sh DESCRIPTION | |
46 | The | |
47 | .Nm UTF2 | |
48 | encoding is based on a proposed X-Open multibyte | |
49 | \s-1FSS-UCS-TF\s+1 (File System Safe Universal Character Set Transformation Format) encoding as used in | |
50 | .Nm Plan 9 from Bell Labs. | |
51 | Although it is capable of representing more than 16 bits, | |
52 | the current implementation is limited to 16 bits as defined by the | |
53 | Unicode Standard. | |
ffd6d581 | 54 | UTF2 is also called UTF8 in some circles. |
a5db3a15 GW |
55 | .Pp |
56 | .Nm UTF2 | |
57 | representation is backwards compatible with ASCII, so 0x00-0x7f refer to the | |
58 | ASCII character set. The multibyte encoding of runes between 0x0080 and 0xffff | |
59 | consist entirely of bytes whose high order bit is set. The actual | |
60 | encoding is represented by the following table: | |
61 | .Bd -literal | |
62 | [0x0000 - 0x007f] [00000000.0bbbbbbb] -> 0bbbbbbb | |
63 | [0x0080 - 0x03ff] [00000bbb.bbbbbbbb] -> 110bbbbb, 10bbbbbb | |
64 | [0x0400 - 0xffff] [bbbbbbbb.bbbbbbbb] -> 1110bbbb, 10bbbbbb, 10bbbbbb | |
65 | .Ed | |
66 | .sp | |
67 | If more than a single representation of a value exists (for example, | |
68 | 0x00; 0xC0 0x80; 0xE0 0x80 0x80) the shortest representation is always | |
69 | used (but the longer ones will be correctly decoded). | |
70 | .Pp | |
71 | The final three encodings provided by X-Open: | |
72 | .Bd -literal | |
73 | [00000000.000bbbbb.bbbbbbbb.bbbbbbbb] -> | |
74 | 11110bbb, 10bbbbbb, 10bbbbbb, 10bbbbbb | |
75 | ||
76 | [000000bb.bbbbbbbb.bbbbbbbb.bbbbbbbb] -> | |
77 | 111110bb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb | |
78 | ||
79 | [0bbbbbbb.bbbbbbbb.bbbbbbbb.bbbbbbbb] -> | |
80 | 1111110b, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb | |
81 | .Ed | |
82 | .sp | |
83 | which provides for the entire proposed ISO-10646 31 bit standard are currently | |
84 | not implemented. | |
85 | .Sh "SEE ALSO" | |
86 | .Xr mklocale 1 , | |
87 | .Xr setlocale 3 |