Commit | Line | Data |
---|---|---|
920dae64 AT |
1 | |
2 | =head1 NAME | |
3 | ||
4 | Locale::Script - ISO codes for script identification (ISO 15924) | |
5 | ||
6 | =head1 SYNOPSIS | |
7 | ||
8 | use Locale::Script; | |
9 | use Locale::Constants; | |
10 | ||
11 | $script = code2script('ph'); # 'Phoenician' | |
12 | $code = script2code('Tibetan'); # 'bo' | |
13 | $code3 = script2code('Tibetan', | |
14 | LOCALE_CODE_ALPHA_3); # 'bod' | |
15 | $codeN = script2code('Tibetan', | |
16 | LOCALE_CODE_ALPHA_NUMERIC); # 330 | |
17 | ||
18 | @codes = all_script_codes(); | |
19 | @scripts = all_script_names(); | |
20 | ||
21 | ||
22 | =head1 DESCRIPTION | |
23 | ||
24 | The C<Locale::Script> module provides access to the ISO | |
25 | codes for identifying scripts, as defined in ISO 15924. | |
26 | For example, Egyptian hieroglyphs are denoted by the two-letter | |
27 | code 'eg', the three-letter code 'egy', and the numeric code 050. | |
28 | ||
29 | You can either access the codes via the conversion routines | |
30 | (described below), or with the two functions which return lists | |
31 | of all script codes or all script names. | |
32 | ||
33 | There are three different code sets you can use for identifying | |
34 | scripts: | |
35 | ||
36 | =over 4 | |
37 | ||
38 | =item B<alpha-2> | |
39 | ||
40 | Two letter codes, such as 'bo' for Tibetan. | |
41 | This code set is identified with the symbol C<LOCALE_CODE_ALPHA_2>. | |
42 | ||
43 | =item B<alpha-3> | |
44 | ||
45 | Three letter codes, such as 'ell' for Greek. | |
46 | This code set is identified with the symbol C<LOCALE_CODE_ALPHA_3>. | |
47 | ||
48 | =item B<numeric> | |
49 | ||
50 | Numeric codes, such as 410 for Hiragana. | |
51 | This code set is identified with the symbol C<LOCALE_CODE_NUMERIC>. | |
52 | ||
53 | =back | |
54 | ||
55 | All of the routines take an optional additional argument | |
56 | which specifies the code set to use. | |
57 | If not specified, it defaults to the two-letter codes. | |
58 | This is partly for backwards compatibility (previous versions | |
59 | of Locale modules only supported the alpha-2 codes), and | |
60 | partly because they are the most widely used codes. | |
61 | ||
62 | The alpha-2 and alpha-3 codes are not case-dependent, | |
63 | so you can use 'BO', 'Bo', 'bO' or 'bo' for Tibetan. | |
64 | When a code is returned by one of the functions in | |
65 | this module, it will always be lower-case. | |
66 | ||
67 | =head2 SPECIAL CODES | |
68 | ||
69 | The standard defines various special codes. | |
70 | ||
71 | =over 4 | |
72 | ||
73 | =item * | |
74 | ||
75 | The standard reserves codes in the ranges B<qa> - B<qt>, | |
76 | B<qaa> - B<qat>, and B<900> - B<919>, for private use. | |
77 | ||
78 | =item * | |
79 | ||
80 | B<zx>, B<zxx>, and B<997>, are the codes for unwritten languages. | |
81 | ||
82 | =item * | |
83 | ||
84 | B<zy>, B<zyy>, and B<998>, are the codes for an undetermined script. | |
85 | ||
86 | =item * | |
87 | ||
88 | B<zz>, B<zzz>, and B<999>, are the codes for an uncoded script. | |
89 | ||
90 | =back | |
91 | ||
92 | The private codes are not recognised by Locale::Script, | |
93 | but the others are. | |
94 | ||
95 | ||
96 | =head1 CONVERSION ROUTINES | |
97 | ||
98 | There are three conversion routines: C<code2script()>, C<script2code()>, | |
99 | and C<script_code2code()>. | |
100 | ||
101 | =over 4 | |
102 | ||
103 | =item code2script( CODE, [ CODESET ] ) | |
104 | ||
105 | This function takes a script code and returns a string | |
106 | which contains the name of the script identified. | |
107 | If the code is not a valid script code, as defined by ISO 15924, | |
108 | then C<undef> will be returned: | |
109 | ||
110 | $script = code2script('cy'); # Cyrillic | |
111 | ||
112 | =item script2code( STRING, [ CODESET ] ) | |
113 | ||
114 | This function takes a script name and returns the corresponding | |
115 | script code, if such exists. | |
116 | If the argument could not be identified as a script name, | |
117 | then C<undef> will be returned: | |
118 | ||
119 | $code = script2code('Gothic', LOCALE_CODE_ALPHA_3); | |
120 | # $code will now be 'gth' | |
121 | ||
122 | The case of the script name is not important. | |
123 | See the section L<KNOWN BUGS AND LIMITATIONS> below. | |
124 | ||
125 | =item script_code2code( CODE, CODESET, CODESET ) | |
126 | ||
127 | This function takes a script code from one code set, | |
128 | and returns the corresponding code from another code set. | |
129 | ||
130 | $alpha2 = script_code2code('jwi', | |
131 | LOCALE_CODE_ALPHA_3 => LOCALE_CODE_ALPHA_2); | |
132 | # $alpha2 will now be 'jw' (Javanese) | |
133 | ||
134 | If the code passed is not a valid script code in | |
135 | the first code set, or if there isn't a code for the | |
136 | corresponding script in the second code set, | |
137 | then C<undef> will be returned. | |
138 | ||
139 | =back | |
140 | ||
141 | ||
142 | =head1 QUERY ROUTINES | |
143 | ||
144 | There are two function which can be used to obtain a list of all codes, | |
145 | or all script names: | |
146 | ||
147 | =over 4 | |
148 | ||
149 | =item C<all_script_codes ( [ CODESET ] )> | |
150 | ||
151 | Returns a list of all two-letter script codes. | |
152 | The codes are guaranteed to be all lower-case, | |
153 | and not in any particular order. | |
154 | ||
155 | =item C<all_script_names ( [ CODESET ] )> | |
156 | ||
157 | Returns a list of all script names for which there is a corresponding | |
158 | script code in the specified code set. | |
159 | The names are capitalised, and not returned in any particular order. | |
160 | ||
161 | =back | |
162 | ||
163 | ||
164 | =head1 EXAMPLES | |
165 | ||
166 | The following example illustrates use of the C<code2script()> function. | |
167 | The user is prompted for a script code, and then told the corresponding | |
168 | script name: | |
169 | ||
170 | $| = 1; # turn off buffering | |
171 | ||
172 | print "Enter script code: "; | |
173 | chop($code = <STDIN>); | |
174 | $script = code2script($code, LOCALE_CODE_ALPHA_2); | |
175 | if (defined $script) | |
176 | { | |
177 | print "$code = $script\n"; | |
178 | } | |
179 | else | |
180 | { | |
181 | print "'$code' is not a valid script code!\n"; | |
182 | } | |
183 | ||
184 | ||
185 | =head1 KNOWN BUGS AND LIMITATIONS | |
186 | ||
187 | =over 4 | |
188 | ||
189 | =item * | |
190 | ||
191 | When using C<script2code()>, the script name must currently appear | |
192 | exactly as it does in the source of the module. For example, | |
193 | ||
194 | script2code('Egyptian hieroglyphs') | |
195 | ||
196 | will return B<eg>, as expected. But the following will all return C<undef>: | |
197 | ||
198 | script2code('hieroglyphs') | |
199 | script2code('Egyptian Hieroglypics') | |
200 | ||
201 | If there's need for it, a future version could have variants | |
202 | for script names. | |
203 | ||
204 | =item * | |
205 | ||
206 | In the current implementation, all data is read in when the | |
207 | module is loaded, and then held in memory. | |
208 | A lazy implementation would be more memory friendly. | |
209 | ||
210 | =back | |
211 | ||
212 | =head1 SEE ALSO | |
213 | ||
214 | =over 4 | |
215 | ||
216 | =item Locale::Language | |
217 | ||
218 | ISO two letter codes for identification of language (ISO 639). | |
219 | ||
220 | =item Locale::Currency | |
221 | ||
222 | ISO three letter codes for identification of currencies | |
223 | and funds (ISO 4217). | |
224 | ||
225 | =item Locale::Country | |
226 | ||
227 | ISO three letter codes for identification of countries (ISO 3166) | |
228 | ||
229 | =item ISO 15924 | |
230 | ||
231 | The ISO standard which defines these codes. | |
232 | ||
233 | =item http://www.evertype.com/standards/iso15924/ | |
234 | ||
235 | Home page for ISO 15924. | |
236 | ||
237 | ||
238 | =back | |
239 | ||
240 | ||
241 | =head1 AUTHOR | |
242 | ||
243 | Neil Bowers E<lt>neil@bowers.comE<gt> | |
244 | ||
245 | =head1 COPYRIGHT | |
246 | ||
247 | Copyright (c) 2002-2004 Neil Bowers. | |
248 | ||
249 | This module is free software; you can redistribute it and/or | |
250 | modify it under the same terms as Perl itself. | |
251 | ||
252 | =cut | |
253 |