| 1 | |
| 2 | # Time-stamp: "2004-01-11 18:35:34 AST" |
| 3 | |
| 4 | =head1 NAME |
| 5 | |
| 6 | Locale::Maketext - framework for localization |
| 7 | |
| 8 | =head1 SYNOPSIS |
| 9 | |
| 10 | package MyProgram; |
| 11 | use strict; |
| 12 | use MyProgram::L10N; |
| 13 | # ...which inherits from Locale::Maketext |
| 14 | my $lh = MyProgram::L10N->get_handle() || die "What language?"; |
| 15 | ... |
| 16 | # And then any messages your program emits, like: |
| 17 | warn $lh->maketext( "Can't open file [_1]: [_2]\n", $f, $! ); |
| 18 | ... |
| 19 | |
| 20 | =head1 DESCRIPTION |
| 21 | |
| 22 | It is a common feature of applications (whether run directly, |
| 23 | or via the Web) for them to be "localized" -- i.e., for them |
| 24 | to a present an English interface to an English-speaker, a German |
| 25 | interface to a German-speaker, and so on for all languages it's |
| 26 | programmed with. Locale::Maketext |
| 27 | is a framework for software localization; it provides you with the |
| 28 | tools for organizing and accessing the bits of text and text-processing |
| 29 | code that you need for producing localized applications. |
| 30 | |
| 31 | In order to make sense of Maketext and how all its |
| 32 | components fit together, you should probably |
| 33 | go read L<Locale::Maketext::TPJ13|Locale::Maketext::TPJ13>, and |
| 34 | I<then> read the following documentation. |
| 35 | |
| 36 | You may also want to read over the source for C<File::Findgrep> |
| 37 | and its constituent modules -- they are a complete (if small) |
| 38 | example application that uses Maketext. |
| 39 | |
| 40 | =head1 QUICK OVERVIEW |
| 41 | |
| 42 | The basic design of Locale::Maketext is object-oriented, and |
| 43 | Locale::Maketext is an abstract base class, from which you |
| 44 | derive a "project class". |
| 45 | The project class (with a name like "TkBocciBall::Localize", |
| 46 | which you then use in your module) is in turn the base class |
| 47 | for all the "language classes" for your project |
| 48 | (with names "TkBocciBall::Localize::it", |
| 49 | "TkBocciBall::Localize::en", |
| 50 | "TkBocciBall::Localize::fr", etc.). |
| 51 | |
| 52 | A language class is |
| 53 | a class containing a lexicon of phrases as class data, |
| 54 | and possibly also some methods that are of use in interpreting |
| 55 | phrases in the lexicon, or otherwise dealing with text in that |
| 56 | language. |
| 57 | |
| 58 | An object belonging to a language class is called a "language |
| 59 | handle"; it's typically a flyweight object. |
| 60 | |
| 61 | The normal course of action is to call: |
| 62 | |
| 63 | use TkBocciBall::Localize; # the localization project class |
| 64 | $lh = TkBocciBall::Localize->get_handle(); |
| 65 | # Depending on the user's locale, etc., this will |
| 66 | # make a language handle from among the classes available, |
| 67 | # and any defaults that you declare. |
| 68 | die "Couldn't make a language handle??" unless $lh; |
| 69 | |
| 70 | From then on, you use the C<maketext> function to access |
| 71 | entries in whatever lexicon(s) belong to the language handle |
| 72 | you got. So, this: |
| 73 | |
| 74 | print $lh->maketext("You won!"), "\n"; |
| 75 | |
| 76 | ...emits the right text for this language. If the object |
| 77 | in C<$lh> belongs to class "TkBocciBall::Localize::fr" and |
| 78 | %TkBocciBall::Localize::fr::Lexicon contains C<("You won!" |
| 79 | =E<gt> "Tu as gagnE<eacute>!")>, then the above |
| 80 | code happily tells the user "Tu as gagnE<eacute>!". |
| 81 | |
| 82 | =head1 METHODS |
| 83 | |
| 84 | Locale::Maketext offers a variety of methods, which fall |
| 85 | into three categories: |
| 86 | |
| 87 | =over |
| 88 | |
| 89 | =item * |
| 90 | |
| 91 | Methods to do with constructing language handles. |
| 92 | |
| 93 | =item * |
| 94 | |
| 95 | C<maketext> and other methods to do with accessing %Lexicon data |
| 96 | for a given language handle. |
| 97 | |
| 98 | =item * |
| 99 | |
| 100 | Methods that you may find it handy to use, from routines of |
| 101 | yours that you put in %Lexicon entries. |
| 102 | |
| 103 | =back |
| 104 | |
| 105 | These are covered in the following section. |
| 106 | |
| 107 | =head2 Construction Methods |
| 108 | |
| 109 | These are to do with constructing a language handle: |
| 110 | |
| 111 | =over |
| 112 | |
| 113 | =item * |
| 114 | |
| 115 | $lh = YourProjClass->get_handle( ...langtags... ) || die "lg-handle?"; |
| 116 | |
| 117 | This tries loading classes based on the language-tags you give (like |
| 118 | C<("en-US", "sk", "kon", "es-MX", "ja", "i-klingon")>, and for the first class |
| 119 | that succeeds, returns YourProjClass::I<language>->new(). |
| 120 | |
| 121 | It runs thru the entire given list of language-tags, and finds no classes |
| 122 | for those exact terms, it then tries "superordinate" language classes. |
| 123 | So if no "en-US" class (i.e., YourProjClass::en_us) |
| 124 | was found, nor classes for anything else in that list, we then try |
| 125 | its superordinate, "en" (i.e., YourProjClass::en), and so on thru |
| 126 | the other language-tags in the given list: "es". |
| 127 | (The other language-tags in our example list: |
| 128 | happen to have no superordinates.) |
| 129 | |
| 130 | If none of those language-tags leads to loadable classes, we then |
| 131 | try classes derived from YourProjClass->fallback_languages() and |
| 132 | then if nothing comes of that, we use classes named by |
| 133 | YourProjClass->fallback_language_classes(). Then in the (probably |
| 134 | quite unlikely) event that that fails, we just return undef. |
| 135 | |
| 136 | =item * |
| 137 | |
| 138 | $lh = YourProjClass->get_handleB<()> || die "lg-handle?"; |
| 139 | |
| 140 | When C<get_handle> is called with an empty parameter list, magic happens: |
| 141 | |
| 142 | If C<get_handle> senses that it's running in program that was |
| 143 | invoked as a CGI, then it tries to get language-tags out of the |
| 144 | environment variable "HTTP_ACCEPT_LANGUAGE", and it pretends that |
| 145 | those were the languages passed as parameters to C<get_handle>. |
| 146 | |
| 147 | Otherwise (i.e., if not a CGI), this tries various OS-specific ways |
| 148 | to get the language-tags for the current locale/language, and then |
| 149 | pretends that those were the value(s) passed to C<get_handle>. |
| 150 | |
| 151 | Currently this OS-specific stuff consists of looking in the environment |
| 152 | variables "LANG" and "LANGUAGE"; and on MSWin machines (where those |
| 153 | variables are typically unused), this also tries using |
| 154 | the module Win32::Locale to get a language-tag for whatever language/locale |
| 155 | is currently selected in the "Regional Settings" (or "International"?) |
| 156 | Control Panel. I welcome further |
| 157 | suggestions for making this do the Right Thing under other operating |
| 158 | systems that support localization. |
| 159 | |
| 160 | If you're using localization in an application that keeps a configuration |
| 161 | file, you might consider something like this in your project class: |
| 162 | |
| 163 | sub get_handle_via_config { |
| 164 | my $class = $_[0]; |
| 165 | my $preferred_language = $Config_settings{'language'}; |
| 166 | my $lh; |
| 167 | if($preferred_language) { |
| 168 | $lh = $class->get_handle($chosen_language) |
| 169 | || die "No language handle for \"$chosen_language\" or the like"; |
| 170 | } else { |
| 171 | # Config file missing, maybe? |
| 172 | $lh = $class->get_handle() |
| 173 | || die "Can't get a language handle"; |
| 174 | } |
| 175 | return $lh; |
| 176 | } |
| 177 | |
| 178 | =item * |
| 179 | |
| 180 | $lh = YourProjClass::langname->new(); |
| 181 | |
| 182 | This constructs a language handle. You usually B<don't> call this |
| 183 | directly, but instead let C<get_handle> find a language class to C<use> |
| 184 | and to then call ->new on. |
| 185 | |
| 186 | =item * |
| 187 | |
| 188 | $lh->init(); |
| 189 | |
| 190 | This is called by ->new to initialize newly-constructed language handles. |
| 191 | If you define an init method in your class, remember that it's usually |
| 192 | considered a good idea to call $lh->SUPER::init in it (presumably at the |
| 193 | beginning), so that all classes get a chance to initialize a new object |
| 194 | however they see fit. |
| 195 | |
| 196 | =item * |
| 197 | |
| 198 | YourProjClass->fallback_languages() |
| 199 | |
| 200 | C<get_handle> appends the return value of this to the end of |
| 201 | whatever list of languages you pass C<get_handle>. Unless |
| 202 | you override this method, your project class |
| 203 | will inherit Locale::Maketext's C<fallback_languages>, which |
| 204 | currently returns C<('i-default', 'en', 'en-US')>. |
| 205 | ("i-default" is defined in RFC 2277). |
| 206 | |
| 207 | This method (by having it return the name |
| 208 | of a language-tag that has an existing language class) |
| 209 | can be used for making sure that |
| 210 | C<get_handle> will always manage to construct a language |
| 211 | handle (assuming your language classes are in an appropriate |
| 212 | @INC directory). Or you can use the next method: |
| 213 | |
| 214 | =item * |
| 215 | |
| 216 | YourProjClass->fallback_language_classes() |
| 217 | |
| 218 | C<get_handle> appends the return value of this to the end |
| 219 | of the list of classes it will try using. Unless |
| 220 | you override this method, your project class |
| 221 | will inherit Locale::Maketext's C<fallback_language_classes>, |
| 222 | which currently returns an empty list, C<()>. |
| 223 | By setting this to some value (namely, the name of a loadable |
| 224 | language class), you can be sure that |
| 225 | C<get_handle> will always manage to construct a language |
| 226 | handle. |
| 227 | |
| 228 | =back |
| 229 | |
| 230 | =head2 The "maketext" Method |
| 231 | |
| 232 | This is the most important method in Locale::Maketext: |
| 233 | |
| 234 | $text = $lh->maketext(I<key>, ...parameters for this phrase...); |
| 235 | |
| 236 | This looks in the %Lexicon of the language handle |
| 237 | $lh and all its superclasses, looking |
| 238 | for an entry whose key is the string I<key>. Assuming such |
| 239 | an entry is found, various things then happen, depending on the |
| 240 | value found: |
| 241 | |
| 242 | If the value is a scalarref, the scalar is dereferenced and returned |
| 243 | (and any parameters are ignored). |
| 244 | If the value is a coderef, we return &$value($lh, ...parameters...). |
| 245 | If the value is a string that I<doesn't> look like it's in Bracket Notation, |
| 246 | we return it (after replacing it with a scalarref, in its %Lexicon). |
| 247 | If the value I<does> look like it's in Bracket Notation, then we compile |
| 248 | it into a sub, replace the string in the %Lexicon with the new coderef, |
| 249 | and then we return &$new_sub($lh, ...parameters...). |
| 250 | |
| 251 | Bracket Notation is discussed in a later section. Note |
| 252 | that trying to compile a string into Bracket Notation can throw |
| 253 | an exception if the string is not syntactically valid (say, by not |
| 254 | balancing brackets right.) |
| 255 | |
| 256 | Also, calling &$coderef($lh, ...parameters...) can throw any sort of |
| 257 | exception (if, say, code in that sub tries to divide by zero). But |
| 258 | a very common exception occurs when you have Bracket |
| 259 | Notation text that says to call a method "foo", but there is no such |
| 260 | method. (E.g., "You have [quaB<tn>,_1,ball]." will throw an exception |
| 261 | on trying to call $lh->quaB<tn>($_[1],'ball') -- you presumably meant |
| 262 | "quant".) C<maketext> catches these exceptions, but only to make the |
| 263 | error message more readable, at which point it rethrows the exception. |
| 264 | |
| 265 | An exception I<may> be thrown if I<key> is not found in any |
| 266 | of $lh's %Lexicon hashes. What happens if a key is not found, |
| 267 | is discussed in a later section, "Controlling Lookup Failure". |
| 268 | |
| 269 | Note that you might find it useful in some cases to override |
| 270 | the C<maketext> method with an "after method", if you want to |
| 271 | translate encodings, or even scripts: |
| 272 | |
| 273 | package YrProj::zh_cn; # Chinese with PRC-style glyphs |
| 274 | use base ('YrProj::zh_tw'); # Taiwan-style |
| 275 | sub maketext { |
| 276 | my $self = shift(@_); |
| 277 | my $value = $self->maketext(@_); |
| 278 | return Chineeze::taiwan2mainland($value); |
| 279 | } |
| 280 | |
| 281 | Or you may want to override it with something that traps |
| 282 | any exceptions, if that's critical to your program: |
| 283 | |
| 284 | sub maketext { |
| 285 | my($lh, @stuff) = @_; |
| 286 | my $out; |
| 287 | eval { $out = $lh->SUPER::maketext(@stuff) }; |
| 288 | return $out unless $@; |
| 289 | ...otherwise deal with the exception... |
| 290 | } |
| 291 | |
| 292 | Other than those two situations, I don't imagine that |
| 293 | it's useful to override the C<maketext> method. (If |
| 294 | you run into a situation where it is useful, I'd be |
| 295 | interested in hearing about it.) |
| 296 | |
| 297 | =over |
| 298 | |
| 299 | =item $lh->fail_with I<or> $lh->fail_with(I<PARAM>) |
| 300 | |
| 301 | =item $lh->failure_handler_auto |
| 302 | |
| 303 | These two methods are discussed in the section "Controlling |
| 304 | Lookup Failure". |
| 305 | |
| 306 | =back |
| 307 | |
| 308 | =head2 Utility Methods |
| 309 | |
| 310 | These are methods that you may find it handy to use, generally |
| 311 | from %Lexicon routines of yours (whether expressed as |
| 312 | Bracket Notation or not). |
| 313 | |
| 314 | =over |
| 315 | |
| 316 | =item $language->quant($number, $singular) |
| 317 | |
| 318 | =item $language->quant($number, $singular, $plural) |
| 319 | |
| 320 | =item $language->quant($number, $singular, $plural, $negative) |
| 321 | |
| 322 | This is generally meant to be called from inside Bracket Notation |
| 323 | (which is discussed later), as in |
| 324 | |
| 325 | "Your search matched [quant,_1,document]!" |
| 326 | |
| 327 | It's for I<quantifying> a noun (i.e., saying how much of it there is, |
| 328 | while giving the correct form of it). The behavior of this method is |
| 329 | handy for English and a few other Western European languages, and you |
| 330 | should override it for languages where it's not suitable. You can feel |
| 331 | free to read the source, but the current implementation is basically |
| 332 | as this pseudocode describes: |
| 333 | |
| 334 | if $number is 0 and there's a $negative, |
| 335 | return $negative; |
| 336 | elsif $number is 1, |
| 337 | return "1 $singular"; |
| 338 | elsif there's a $plural, |
| 339 | return "$number $plural"; |
| 340 | else |
| 341 | return "$number " . $singular . "s"; |
| 342 | # |
| 343 | # ...except that we actually call numf to |
| 344 | # stringify $number before returning it. |
| 345 | |
| 346 | So for English (with Bracket Notation) |
| 347 | C<"...[quant,_1,file]..."> is fine (for 0 it returns "0 files", |
| 348 | for 1 it returns "1 file", and for more it returns "2 files", etc.) |
| 349 | |
| 350 | But for "directory", you'd want C<"[quant,_1,directory,directories]"> |
| 351 | so that our elementary C<quant> method doesn't think that the |
| 352 | plural of "directory" is "directorys". And you might find that the |
| 353 | output may sound better if you specify a negative form, as in: |
| 354 | |
| 355 | "[quant,_1,file,files,No files] matched your query.\n" |
| 356 | |
| 357 | Remember to keep in mind verb agreement (or adjectives too, in |
| 358 | other languages), as in: |
| 359 | |
| 360 | "[quant,_1,document] were matched.\n" |
| 361 | |
| 362 | Because if _1 is one, you get "1 document B<were> matched". |
| 363 | An acceptable hack here is to do something like this: |
| 364 | |
| 365 | "[quant,_1,document was, documents were] matched.\n" |
| 366 | |
| 367 | =item $language->numf($number) |
| 368 | |
| 369 | This returns the given number formatted nicely according to |
| 370 | this language's conventions. Maketext's default method is |
| 371 | mostly to just take the normal string form of the number |
| 372 | (applying sprintf "%G" for only very large numbers), and then |
| 373 | to add commas as necessary. (Except that |
| 374 | we apply C<tr/,./.,/> if $language->{'numf_comma'} is true; |
| 375 | that's a bit of a hack that's useful for languages that express |
| 376 | two million as "2.000.000" and not as "2,000,000"). |
| 377 | |
| 378 | If you want anything fancier, consider overriding this with something |
| 379 | that uses L<Number::Format|Number::Format>, or does something else |
| 380 | entirely. |
| 381 | |
| 382 | Note that numf is called by quant for stringifying all quantifying |
| 383 | numbers. |
| 384 | |
| 385 | =item $language->sprintf($format, @items) |
| 386 | |
| 387 | This is just a wrapper around Perl's normal C<sprintf> function. |
| 388 | It's provided so that you can use "sprintf" in Bracket Notation: |
| 389 | |
| 390 | "Couldn't access datanode [sprintf,%10x=~[%s~],_1,_2]!\n" |
| 391 | |
| 392 | returning... |
| 393 | |
| 394 | Couldn't access datanode Stuff=[thangamabob]! |
| 395 | |
| 396 | =item $language->language_tag() |
| 397 | |
| 398 | Currently this just takes the last bit of C<ref($language)>, turns |
| 399 | underscores to dashes, and returns it. So if $language is |
| 400 | an object of class Hee::HOO::Haw::en_us, $language->language_tag() |
| 401 | returns "en-us". (Yes, the usual representation for that language |
| 402 | tag is "en-US", but case is I<never> considered meaningful in |
| 403 | language-tag comparison.) |
| 404 | |
| 405 | You may override this as you like; Maketext doesn't use it for |
| 406 | anything. |
| 407 | |
| 408 | =item $language->encoding() |
| 409 | |
| 410 | Currently this isn't used for anything, but it's provided |
| 411 | (with default value of |
| 412 | C<(ref($language) && $language-E<gt>{'encoding'})) or "iso-8859-1"> |
| 413 | ) as a sort of suggestion that it may be useful/necessary to |
| 414 | associate encodings with your language handles (whether on a |
| 415 | per-class or even per-handle basis.) |
| 416 | |
| 417 | =back |
| 418 | |
| 419 | =head2 Language Handle Attributes and Internals |
| 420 | |
| 421 | A language handle is a flyweight object -- i.e., it doesn't (necessarily) |
| 422 | carry any data of interest, other than just being a member of |
| 423 | whatever class it belongs to. |
| 424 | |
| 425 | A language handle is implemented as a blessed hash. Subclasses of yours |
| 426 | can store whatever data you want in the hash. Currently the only hash |
| 427 | entry used by any crucial Maketext method is "fail", so feel free to |
| 428 | use anything else as you like. |
| 429 | |
| 430 | B<Remember: Don't be afraid to read the Maketext source if there's |
| 431 | any point on which this documentation is unclear.> This documentation |
| 432 | is vastly longer than the module source itself. |
| 433 | |
| 434 | =over |
| 435 | |
| 436 | =back |
| 437 | |
| 438 | =head1 LANGUAGE CLASS HIERARCHIES |
| 439 | |
| 440 | These are Locale::Maketext's assumptions about the class |
| 441 | hierarchy formed by all your language classes: |
| 442 | |
| 443 | =over |
| 444 | |
| 445 | =item * |
| 446 | |
| 447 | You must have a project base class, which you load, and |
| 448 | which you then use as the first argument in |
| 449 | the call to YourProjClass->get_handle(...). It should derive |
| 450 | (whether directly or indirectly) from Locale::Maketext. |
| 451 | It B<doesn't matter> how you name this class, altho assuming this |
| 452 | is the localization component of your Super Mega Program, |
| 453 | good names for your project class might be |
| 454 | SuperMegaProgram::Localization, SuperMegaProgram::L10N, |
| 455 | SuperMegaProgram::I18N, SuperMegaProgram::International, |
| 456 | or even SuperMegaProgram::Languages or SuperMegaProgram::Messages. |
| 457 | |
| 458 | =item * |
| 459 | |
| 460 | Language classes are what YourProjClass->get_handle will try to load. |
| 461 | It will look for them by taking each language-tag (B<skipping> it |
| 462 | if it doesn't look like a language-tag or locale-tag!), turning it to |
| 463 | all lowercase, turning and dashes to underscores, and appending it |
| 464 | to YourProjClass . "::". So this: |
| 465 | |
| 466 | $lh = YourProjClass->get_handle( |
| 467 | 'en-US', 'fr', 'kon', 'i-klingon', 'i-klingon-romanized' |
| 468 | ); |
| 469 | |
| 470 | will try loading the classes |
| 471 | YourProjClass::en_us (note lowercase!), YourProjClass::fr, |
| 472 | YourProjClass::kon, |
| 473 | YourProjClass::i_klingon |
| 474 | and YourProjClass::i_klingon_romanized. (And it'll stop at the |
| 475 | first one that actually loads.) |
| 476 | |
| 477 | =item * |
| 478 | |
| 479 | I assume that each language class derives (directly or indirectly) |
| 480 | from your project class, and also defines its @ISA, its %Lexicon, |
| 481 | or both. But I anticipate no dire consequences if these assumptions |
| 482 | do not hold. |
| 483 | |
| 484 | =item * |
| 485 | |
| 486 | Language classes may derive from other language classes (altho they |
| 487 | should have "use I<Thatclassname>" or "use base qw(I<...classes...>)"). |
| 488 | They may derive from the project |
| 489 | class. They may derive from some other class altogether. Or via |
| 490 | multiple inheritance, it may derive from any mixture of these. |
| 491 | |
| 492 | =item * |
| 493 | |
| 494 | I foresee no problems with having multiple inheritance in |
| 495 | your hierarchy of language classes. (As usual, however, Perl will |
| 496 | complain bitterly if you have a cycle in the hierarchy: i.e., if |
| 497 | any class is its own ancestor.) |
| 498 | |
| 499 | =back |
| 500 | |
| 501 | =head1 ENTRIES IN EACH LEXICON |
| 502 | |
| 503 | A typical %Lexicon entry is meant to signify a phrase, |
| 504 | taking some number (0 or more) of parameters. An entry |
| 505 | is meant to be accessed by via |
| 506 | a string I<key> in $lh->maketext(I<key>, ...parameters...), |
| 507 | which should return a string that is generally meant for |
| 508 | be used for "output" to the user -- regardless of whether |
| 509 | this actually means printing to STDOUT, writing to a file, |
| 510 | or putting into a GUI widget. |
| 511 | |
| 512 | While the key must be a string value (since that's a basic |
| 513 | restriction that Perl places on hash keys), the value in |
| 514 | the lexicon can currently be of several types: |
| 515 | a defined scalar, scalarref, or coderef. The use of these is |
| 516 | explained above, in the section 'The "maketext" Method', and |
| 517 | Bracket Notation for strings is discussed in the next section. |
| 518 | |
| 519 | While you can use arbitrary unique IDs for lexicon keys |
| 520 | (like "_min_larger_max_error"), it is often |
| 521 | useful for if an entry's key is itself a valid value, like |
| 522 | this example error message: |
| 523 | |
| 524 | "Minimum ([_1]) is larger than maximum ([_2])!\n", |
| 525 | |
| 526 | Compare this code that uses an arbitrary ID... |
| 527 | |
| 528 | die $lh->maketext( "_min_larger_max_error", $min, $max ) |
| 529 | if $min > $max; |
| 530 | |
| 531 | ...to this code that uses a key-as-value: |
| 532 | |
| 533 | die $lh->maketext( |
| 534 | "Minimum ([_1]) is larger than maximum ([_2])!\n", |
| 535 | $min, $max |
| 536 | ) if $min > $max; |
| 537 | |
| 538 | The second is, in short, more readable. In particular, it's obvious |
| 539 | that the number of parameters you're feeding to that phrase (two) is |
| 540 | the number of parameters that it I<wants> to be fed. (Since you see |
| 541 | _1 and a _2 being used in the key there.) |
| 542 | |
| 543 | Also, once a project is otherwise |
| 544 | complete and you start to localize it, you can scrape together |
| 545 | all the various keys you use, and pass it to a translator; and then |
| 546 | the translator's work will go faster if what he's presented is this: |
| 547 | |
| 548 | "Minimum ([_1]) is larger than maximum ([_2])!\n", |
| 549 | => "", # fill in something here, Jacques! |
| 550 | |
| 551 | rather than this more cryptic mess: |
| 552 | |
| 553 | "_min_larger_max_error" |
| 554 | => "", # fill in something here, Jacques |
| 555 | |
| 556 | I think that keys as lexicon values makes the completed lexicon |
| 557 | entries more readable: |
| 558 | |
| 559 | "Minimum ([_1]) is larger than maximum ([_2])!\n", |
| 560 | => "Le minimum ([_1]) est plus grand que le maximum ([_2])!\n", |
| 561 | |
| 562 | Also, having valid values as keys becomes very useful if you set |
| 563 | up an _AUTO lexicon. _AUTO lexicons are discussed in a later |
| 564 | section. |
| 565 | |
| 566 | I almost always use keys that are themselves |
| 567 | valid lexicon values. One notable exception is when the value is |
| 568 | quite long. For example, to get the screenful of data that |
| 569 | a command-line program might returns when given an unknown switch, |
| 570 | I often just use a key "_USAGE_MESSAGE". At that point I then go |
| 571 | and immediately to define that lexicon entry in the |
| 572 | ProjectClass::L10N::en lexicon (since English is always my "project |
| 573 | language"): |
| 574 | |
| 575 | '_USAGE_MESSAGE' => <<'EOSTUFF', |
| 576 | ...long long message... |
| 577 | EOSTUFF |
| 578 | |
| 579 | and then I can use it as: |
| 580 | |
| 581 | getopt('oDI', \%opts) or die $lh->maketext('_USAGE_MESSAGE'); |
| 582 | |
| 583 | Incidentally, |
| 584 | note that each class's C<%Lexicon> inherits-and-extends |
| 585 | the lexicons in its superclasses. This is not because these are |
| 586 | special hashes I<per se>, but because you access them via the |
| 587 | C<maketext> method, which looks for entries across all the |
| 588 | C<%Lexicon>'s in a language class I<and> all its ancestor classes. |
| 589 | (This is because the idea of "class data" isn't directly implemented |
| 590 | in Perl, but is instead left to individual class-systems to implement |
| 591 | as they see fit..) |
| 592 | |
| 593 | Note that you may have things stored in a lexicon |
| 594 | besides just phrases for output: for example, if your program |
| 595 | takes input from the keyboard, asking a "(Y/N)" question, |
| 596 | you probably need to know what equivalent of "Y[es]/N[o]" is |
| 597 | in whatever language. You probably also need to know what |
| 598 | the equivalents of the answers "y" and "n" are. You can |
| 599 | store that information in the lexicon (say, under the keys |
| 600 | "~answer_y" and "~answer_n", and the long forms as |
| 601 | "~answer_yes" and "~answer_no", where "~" is just an ad-hoc |
| 602 | character meant to indicate to programmers/translators that |
| 603 | these are not phrases for output). |
| 604 | |
| 605 | Or instead of storing this in the language class's lexicon, |
| 606 | you can (and, in some cases, really should) represent the same bit |
| 607 | of knowledge as code is a method in the language class. (That |
| 608 | leaves a tidy distinction between the lexicon as the things we |
| 609 | know how to I<say>, and the rest of the things in the lexicon class |
| 610 | as things that we know how to I<do>.) Consider |
| 611 | this example of a processor for responses to French "oui/non" |
| 612 | questions: |
| 613 | |
| 614 | sub y_or_n { |
| 615 | return undef unless defined $_[1] and length $_[1]; |
| 616 | my $answer = lc $_[1]; # smash case |
| 617 | return 1 if $answer eq 'o' or $answer eq 'oui'; |
| 618 | return 0 if $answer eq 'n' or $answer eq 'non'; |
| 619 | return undef; |
| 620 | } |
| 621 | |
| 622 | ...which you'd then call in a construct like this: |
| 623 | |
| 624 | my $response; |
| 625 | until(defined $response) { |
| 626 | print $lh->maketext("Open the pod bay door (y/n)? "); |
| 627 | $response = $lh->y_or_n( get_input_from_keyboard_somehow() ); |
| 628 | } |
| 629 | if($response) { $pod_bay_door->open() } |
| 630 | else { $pod_bay_door->leave_closed() } |
| 631 | |
| 632 | Other data worth storing in a lexicon might be things like |
| 633 | filenames for language-targetted resources: |
| 634 | |
| 635 | ... |
| 636 | "_main_splash_png" |
| 637 | => "/styles/en_us/main_splash.png", |
| 638 | "_main_splash_imagemap" |
| 639 | => "/styles/en_us/main_splash.incl", |
| 640 | "_general_graphics_path" |
| 641 | => "/styles/en_us/", |
| 642 | "_alert_sound" |
| 643 | => "/styles/en_us/hey_there.wav", |
| 644 | "_forward_icon" |
| 645 | => "left_arrow.png", |
| 646 | "_backward_icon" |
| 647 | => "right_arrow.png", |
| 648 | # In some other languages, left equals |
| 649 | # BACKwards, and right is FOREwards. |
| 650 | ... |
| 651 | |
| 652 | You might want to do the same thing for expressing key bindings |
| 653 | or the like (since hardwiring "q" as the binding for the function |
| 654 | that quits a screen/menu/program is useful only if your language |
| 655 | happens to associate "q" with "quit"!) |
| 656 | |
| 657 | =head1 BRACKET NOTATION |
| 658 | |
| 659 | Bracket Notation is a crucial feature of Locale::Maketext. I mean |
| 660 | Bracket Notation to provide a replacement for sprintf formatting. |
| 661 | Everything you do with Bracket Notation could be done with a sub block, |
| 662 | but bracket notation is meant to be much more concise. |
| 663 | |
| 664 | Bracket Notation is a like a miniature "template" system (in the sense |
| 665 | of L<Text::Template|Text::Template>, not in the sense of C++ templates), |
| 666 | where normal text is passed thru basically as is, but text is special |
| 667 | regions is specially interpreted. In Bracket Notation, you use brackets |
| 668 | ("[...]" -- not "{...}"!) to note sections that are specially interpreted. |
| 669 | |
| 670 | For example, here all the areas that are taken literally are underlined with |
| 671 | a "^", and all the in-bracket special regions are underlined with an X: |
| 672 | |
| 673 | "Minimum ([_1]) is larger than maximum ([_2])!\n", |
| 674 | ^^^^^^^^^ XX ^^^^^^^^^^^^^^^^^^^^^^^^^^ XX ^^^^ |
| 675 | |
| 676 | When that string is compiled from bracket notation into a real Perl sub, |
| 677 | it's basically turned into: |
| 678 | |
| 679 | sub { |
| 680 | my $lh = $_[0]; |
| 681 | my @params = @_; |
| 682 | return join '', |
| 683 | "Minimum (", |
| 684 | ...some code here... |
| 685 | ") is larger than maximum (", |
| 686 | ...some code here... |
| 687 | ")!\n", |
| 688 | } |
| 689 | # to be called by $lh->maketext(KEY, params...) |
| 690 | |
| 691 | In other words, text outside bracket groups is turned into string |
| 692 | literals. Text in brackets is rather more complex, and currently follows |
| 693 | these rules: |
| 694 | |
| 695 | =over |
| 696 | |
| 697 | =item * |
| 698 | |
| 699 | Bracket groups that are empty, or which consist only of whitespace, |
| 700 | are ignored. (Examples: "[]", "[ ]", or a [ and a ] with returns |
| 701 | and/or tabs and/or spaces between them. |
| 702 | |
| 703 | Otherwise, each group is taken to be a comma-separated group of items, |
| 704 | and each item is interpreted as follows: |
| 705 | |
| 706 | =item * |
| 707 | |
| 708 | An item that is "_I<digits>" or "_-I<digits>" is interpreted as |
| 709 | $_[I<value>]. I.e., "_1" is becomes with $_[1], and "_-3" is interpreted |
| 710 | as $_[-3] (in which case @_ should have at least three elements in it). |
| 711 | Note that $_[0] is the language handle, and is typically not named |
| 712 | directly. |
| 713 | |
| 714 | =item * |
| 715 | |
| 716 | An item "_*" is interpreted to mean "all of @_ except $_[0]". |
| 717 | I.e., C<@_[1..$#_]>. Note that this is an empty list in the case |
| 718 | of calls like $lh->maketext(I<key>) where there are no |
| 719 | parameters (except $_[0], the language handle). |
| 720 | |
| 721 | =item * |
| 722 | |
| 723 | Otherwise, each item is interpreted as a string literal. |
| 724 | |
| 725 | =back |
| 726 | |
| 727 | The group as a whole is interpreted as follows: |
| 728 | |
| 729 | =over |
| 730 | |
| 731 | =item * |
| 732 | |
| 733 | If the first item in a bracket group looks like a method name, |
| 734 | then that group is interpreted like this: |
| 735 | |
| 736 | $lh->that_method_name( |
| 737 | ...rest of items in this group... |
| 738 | ), |
| 739 | |
| 740 | =item * |
| 741 | |
| 742 | If the first item in a bracket group is "*", it's taken as shorthand |
| 743 | for the so commonly called "quant" method. Similarly, if the first |
| 744 | item in a bracket group is "#", it's taken to be shorthand for |
| 745 | "numf". |
| 746 | |
| 747 | =item * |
| 748 | |
| 749 | If the first item in a bracket group is empty-string, or "_*" |
| 750 | or "_I<digits>" or "_-I<digits>", then that group is interpreted |
| 751 | as just the interpolation of all its items: |
| 752 | |
| 753 | join('', |
| 754 | ...rest of items in this group... |
| 755 | ), |
| 756 | |
| 757 | Examples: "[_1]" and "[,_1]", which are synonymous; and |
| 758 | "C<[,ID-(,_4,-,_2,)]>", which compiles as |
| 759 | C<join "", "ID-(", $_[4], "-", $_[2], ")">. |
| 760 | |
| 761 | =item * |
| 762 | |
| 763 | Otherwise this bracket group is invalid. For example, in the group |
| 764 | "[!@#,whatever]", the first item C<"!@#"> is neither empty-string, |
| 765 | "_I<number>", "_-I<number>", "_*", nor a valid method name; and so |
| 766 | Locale::Maketext will throw an exception of you try compiling an |
| 767 | expression containing this bracket group. |
| 768 | |
| 769 | =back |
| 770 | |
| 771 | Note, incidentally, that items in each group are comma-separated, |
| 772 | not C</\s*,\s*/>-separated. That is, you might expect that this |
| 773 | bracket group: |
| 774 | |
| 775 | "Hoohah [foo, _1 , bar ,baz]!" |
| 776 | |
| 777 | would compile to this: |
| 778 | |
| 779 | sub { |
| 780 | my $lh = $_[0]; |
| 781 | return join '', |
| 782 | "Hoohah ", |
| 783 | $lh->foo( $_[1], "bar", "baz"), |
| 784 | "!", |
| 785 | } |
| 786 | |
| 787 | But it actually compiles as this: |
| 788 | |
| 789 | sub { |
| 790 | my $lh = $_[0]; |
| 791 | return join '', |
| 792 | "Hoohah ", |
| 793 | $lh->foo(" _1 ", " bar ", "baz"), #!!! |
| 794 | "!", |
| 795 | } |
| 796 | |
| 797 | In the notation discussed so far, the characters "[" and "]" are given |
| 798 | special meaning, for opening and closing bracket groups, and "," has |
| 799 | a special meaning inside bracket groups, where it separates items in the |
| 800 | group. This begs the question of how you'd express a literal "[" or |
| 801 | "]" in a Bracket Notation string, and how you'd express a literal |
| 802 | comma inside a bracket group. For this purpose I've adopted "~" (tilde) |
| 803 | as an escape character: "~[" means a literal '[' character anywhere |
| 804 | in Bracket Notation (i.e., regardless of whether you're in a bracket |
| 805 | group or not), and ditto for "~]" meaning a literal ']', and "~," meaning |
| 806 | a literal comma. (Altho "," means a literal comma outside of |
| 807 | bracket groups -- it's only inside bracket groups that commas are special.) |
| 808 | |
| 809 | And on the off chance you need a literal tilde in a bracket expression, |
| 810 | you get it with "~~". |
| 811 | |
| 812 | Currently, an unescaped "~" before a character |
| 813 | other than a bracket or a comma is taken to mean just a "~" and that |
| 814 | character. I.e., "~X" means the same as "~~X" -- i.e., one literal tilde, |
| 815 | and then one literal "X". However, by using "~X", you are assuming that |
| 816 | no future version of Maketext will use "~X" as a magic escape sequence. |
| 817 | In practice this is not a great problem, since first off you can just |
| 818 | write "~~X" and not worry about it; second off, I doubt I'll add lots |
| 819 | of new magic characters to bracket notation; and third off, you |
| 820 | aren't likely to want literal "~" characters in your messages anyway, |
| 821 | since it's not a character with wide use in natural language text. |
| 822 | |
| 823 | Brackets must be balanced -- every openbracket must have |
| 824 | one matching closebracket, and vice versa. So these are all B<invalid>: |
| 825 | |
| 826 | "I ate [quant,_1,rhubarb pie." |
| 827 | "I ate [quant,_1,rhubarb pie[." |
| 828 | "I ate quant,_1,rhubarb pie]." |
| 829 | "I ate quant,_1,rhubarb pie[." |
| 830 | |
| 831 | Currently, bracket groups do not nest. That is, you B<cannot> say: |
| 832 | |
| 833 | "Foo [bar,baz,[quux,quuux]]\n"; |
| 834 | |
| 835 | If you need a notation that's that powerful, use normal Perl: |
| 836 | |
| 837 | %Lexicon = ( |
| 838 | ... |
| 839 | "some_key" => sub { |
| 840 | my $lh = $_[0]; |
| 841 | join '', |
| 842 | "Foo ", |
| 843 | $lh->bar('baz', $lh->quux('quuux')), |
| 844 | "\n", |
| 845 | }, |
| 846 | ... |
| 847 | ); |
| 848 | |
| 849 | Or write the "bar" method so you don't need to pass it the |
| 850 | output from calling quux. |
| 851 | |
| 852 | I do not anticipate that you will need (or particularly want) |
| 853 | to nest bracket groups, but you are welcome to email me with |
| 854 | convincing (real-life) arguments to the contrary. |
| 855 | |
| 856 | =head1 AUTO LEXICONS |
| 857 | |
| 858 | If maketext goes to look in an individual %Lexicon for an entry |
| 859 | for I<key> (where I<key> does not start with an underscore), and |
| 860 | sees none, B<but does see> an entry of "_AUTO" => I<some_true_value>, |
| 861 | then we actually define $Lexicon{I<key>} = I<key> right then and there, |
| 862 | and then use that value as if it had been there all |
| 863 | along. This happens before we even look in any superclass %Lexicons! |
| 864 | |
| 865 | (This is meant to be somewhat like the AUTOLOAD mechanism in |
| 866 | Perl's function call system -- or, looked at another way, |
| 867 | like the L<AutoLoader|AutoLoader> module.) |
| 868 | |
| 869 | I can picture all sorts of circumstances where you just |
| 870 | do not want lookup to be able to fail (since failing |
| 871 | normally means that maketext throws a C<die>, altho |
| 872 | see the next section for greater control over that). But |
| 873 | here's one circumstance where _AUTO lexicons are meant to |
| 874 | be I<especially> useful: |
| 875 | |
| 876 | As you're writing an application, you decide as you go what messages |
| 877 | you need to emit. Normally you'd go to write this: |
| 878 | |
| 879 | if(-e $filename) { |
| 880 | go_process_file($filename) |
| 881 | } else { |
| 882 | print "Couldn't find file \"$filename\"!\n"; |
| 883 | } |
| 884 | |
| 885 | but since you anticipate localizing this, you write: |
| 886 | |
| 887 | use ThisProject::I18N; |
| 888 | my $lh = ThisProject::I18N->get_handle(); |
| 889 | # For the moment, assume that things are set up so |
| 890 | # that we load class ThisProject::I18N::en |
| 891 | # and that that's the class that $lh belongs to. |
| 892 | ... |
| 893 | if(-e $filename) { |
| 894 | go_process_file($filename) |
| 895 | } else { |
| 896 | print $lh->maketext( |
| 897 | "Couldn't find file \"[_1]\"!\n", $filename |
| 898 | ); |
| 899 | } |
| 900 | |
| 901 | Now, right after you've just written the above lines, you'd |
| 902 | normally have to go open the file |
| 903 | ThisProject/I18N/en.pm, and immediately add an entry: |
| 904 | |
| 905 | "Couldn't find file \"[_1]\"!\n" |
| 906 | => "Couldn't find file \"[_1]\"!\n", |
| 907 | |
| 908 | But I consider that somewhat of a distraction from the work |
| 909 | of getting the main code working -- to say nothing of the fact |
| 910 | that I often have to play with the program a few times before |
| 911 | I can decide exactly what wording I want in the messages (which |
| 912 | in this case would require me to go changing three lines of code: |
| 913 | the call to maketext with that key, and then the two lines in |
| 914 | ThisProject/I18N/en.pm). |
| 915 | |
| 916 | However, if you set "_AUTO => 1" in the %Lexicon in, |
| 917 | ThisProject/I18N/en.pm (assuming that English (en) is |
| 918 | the language that all your programmers will be using for this |
| 919 | project's internal message keys), then you don't ever have to |
| 920 | go adding lines like this |
| 921 | |
| 922 | "Couldn't find file \"[_1]\"!\n" |
| 923 | => "Couldn't find file \"[_1]\"!\n", |
| 924 | |
| 925 | to ThisProject/I18N/en.pm, because if _AUTO is true there, |
| 926 | then just looking for an entry with the key "Couldn't find |
| 927 | file \"[_1]\"!\n" in that lexicon will cause it to be added, |
| 928 | with that value! |
| 929 | |
| 930 | Note that the reason that keys that start with "_" |
| 931 | are immune to _AUTO isn't anything generally magical about |
| 932 | the underscore character -- I just wanted a way to have most |
| 933 | lexicon keys be autoable, except for possibly a few, and I |
| 934 | arbitrarily decided to use a leading underscore as a signal |
| 935 | to distinguish those few. |
| 936 | |
| 937 | =head1 CONTROLLING LOOKUP FAILURE |
| 938 | |
| 939 | If you call $lh->maketext(I<key>, ...parameters...), |
| 940 | and there's no entry I<key> in $lh's class's %Lexicon, nor |
| 941 | in the superclass %Lexicon hash, I<and> if we can't auto-make |
| 942 | I<key> (because either it starts with a "_", or because none |
| 943 | of its lexicons have C<_AUTO =E<gt> 1,>), then we have |
| 944 | failed to find a normal way to maketext I<key>. What then |
| 945 | happens in these failure conditions, depends on the $lh object |
| 946 | "fail" attribute. |
| 947 | |
| 948 | If the language handle has no "fail" attribute, maketext |
| 949 | will simply throw an exception (i.e., it calls C<die>, mentioning |
| 950 | the I<key> whose lookup failed, and naming the line number where |
| 951 | the calling $lh->maketext(I<key>,...) was. |
| 952 | |
| 953 | If the language handle has a "fail" attribute whose value is a |
| 954 | coderef, then $lh->maketext(I<key>,...params...) gives up and calls: |
| 955 | |
| 956 | return &{$that_subref}($lh, $key, @params); |
| 957 | |
| 958 | Otherwise, the "fail" attribute's value should be a string denoting |
| 959 | a method name, so that $lh->maketext(I<key>,...params...) can |
| 960 | give up with: |
| 961 | |
| 962 | return $lh->$that_method_name($phrase, @params); |
| 963 | |
| 964 | The "fail" attribute can be accessed with the C<fail_with> method: |
| 965 | |
| 966 | # Set to a coderef: |
| 967 | $lh->fail_with( \&failure_handler ); |
| 968 | |
| 969 | # Set to a method name: |
| 970 | $lh->fail_with( 'failure_method' ); |
| 971 | |
| 972 | # Set to nothing (i.e., so failure throws a plain exception) |
| 973 | $lh->fail_with( undef ); |
| 974 | |
| 975 | # Simply read: |
| 976 | $handler = $lh->fail_with(); |
| 977 | |
| 978 | Now, as to what you may want to do with these handlers: Maybe you'd |
| 979 | want to log what key failed for what class, and then die. Maybe |
| 980 | you don't like C<die> and instead you want to send the error message |
| 981 | to STDOUT (or wherever) and then merely C<exit()>. |
| 982 | |
| 983 | Or maybe you don't want to C<die> at all! Maybe you could use a |
| 984 | handler like this: |
| 985 | |
| 986 | # Make all lookups fall back onto an English value, |
| 987 | # but after we log it for later fingerpointing. |
| 988 | my $lh_backup = ThisProject->get_handle('en'); |
| 989 | open(LEX_FAIL_LOG, ">>wherever/lex.log") || die "GNAARGH $!"; |
| 990 | sub lex_fail { |
| 991 | my($failing_lh, $key, $params) = @_; |
| 992 | print LEX_FAIL_LOG scalar(localtime), "\t", |
| 993 | ref($failing_lh), "\t", $key, "\n"; |
| 994 | return $lh_backup->maketext($key,@params); |
| 995 | } |
| 996 | |
| 997 | Some users have expressed that they think this whole mechanism of |
| 998 | having a "fail" attribute at all, seems a rather pointless complication. |
| 999 | But I want Locale::Maketext to be usable for software projects of I<any> |
| 1000 | scale and type; and different software projects have different ideas |
| 1001 | of what the right thing is to do in failure conditions. I could simply |
| 1002 | say that failure always throws an exception, and that if you want to be |
| 1003 | careful, you'll just have to wrap every call to $lh->maketext in an |
| 1004 | S<eval { }>. However, I want programmers to reserve the right (via |
| 1005 | the "fail" attribute) to treat lookup failure as something other than |
| 1006 | an exception of the same level of severity as a config file being |
| 1007 | unreadable, or some essential resource being inaccessible. |
| 1008 | |
| 1009 | One possibly useful value for the "fail" attribute is the method name |
| 1010 | "failure_handler_auto". This is a method defined in class |
| 1011 | Locale::Maketext itself. You set it with: |
| 1012 | |
| 1013 | $lh->fail_with('failure_handler_auto'); |
| 1014 | |
| 1015 | Then when you call $lh->maketext(I<key>, ...parameters...) and |
| 1016 | there's no I<key> in any of those lexicons, maketext gives up with |
| 1017 | |
| 1018 | return $lh->failure_handler_auto($key, @params); |
| 1019 | |
| 1020 | But failure_handler_auto, instead of dying or anything, compiles |
| 1021 | $key, caching it in $lh->{'failure_lex'}{$key} = $complied, |
| 1022 | and then calls the compiled value, and returns that. (I.e., if |
| 1023 | $key looks like bracket notation, $compiled is a sub, and we return |
| 1024 | &{$compiled}(@params); but if $key is just a plain string, we just |
| 1025 | return that.) |
| 1026 | |
| 1027 | The effect of using "failure_auto_handler" |
| 1028 | is like an AUTO lexicon, except that it 1) compiles $key even if |
| 1029 | it starts with "_", and 2) you have a record in the new hashref |
| 1030 | $lh->{'failure_lex'} of all the keys that have failed for |
| 1031 | this object. This should avoid your program dying -- as long |
| 1032 | as your keys aren't actually invalid as bracket code, and as |
| 1033 | long as they don't try calling methods that don't exist. |
| 1034 | |
| 1035 | "failure_auto_handler" may not be exactly what you want, but I |
| 1036 | hope it at least shows you that maketext failure can be mitigated |
| 1037 | in any number of very flexible ways. If you can formalize exactly |
| 1038 | what you want, you should be able to express that as a failure |
| 1039 | handler. You can even make it default for every object of a given |
| 1040 | class, by setting it in that class's init: |
| 1041 | |
| 1042 | sub init { |
| 1043 | my $lh = $_[0]; # a newborn handle |
| 1044 | $lh->SUPER::init(); |
| 1045 | $lh->fail_with('my_clever_failure_handler'); |
| 1046 | return; |
| 1047 | } |
| 1048 | sub my_clever_failure_handler { |
| 1049 | ...you clever things here... |
| 1050 | } |
| 1051 | |
| 1052 | =head1 HOW TO USE MAKETEXT |
| 1053 | |
| 1054 | Here is a brief checklist on how to use Maketext to localize |
| 1055 | applications: |
| 1056 | |
| 1057 | =over |
| 1058 | |
| 1059 | =item * |
| 1060 | |
| 1061 | Decide what system you'll use for lexicon keys. If you insist, |
| 1062 | you can use opaque IDs (if you're nostalgic for C<catgets>), |
| 1063 | but I have better suggestions in the |
| 1064 | section "Entries in Each Lexicon", above. Assuming you opt for |
| 1065 | meaningful keys that double as values (like "Minimum ([_1]) is |
| 1066 | larger than maximum ([_2])!\n"), you'll have to settle on what |
| 1067 | language those should be in. For the sake of argument, I'll |
| 1068 | call this English, specifically American English, "en-US". |
| 1069 | |
| 1070 | =item * |
| 1071 | |
| 1072 | Create a class for your localization project. This is |
| 1073 | the name of the class that you'll use in the idiom: |
| 1074 | |
| 1075 | use Projname::L10N; |
| 1076 | my $lh = Projname::L10N->get_handle(...) || die "Language?"; |
| 1077 | |
| 1078 | Assuming your call your class Projname::L10N, create a class |
| 1079 | consisting minimally of: |
| 1080 | |
| 1081 | package Projname::L10N; |
| 1082 | use base qw(Locale::Maketext); |
| 1083 | ...any methods you might want all your languages to share... |
| 1084 | |
| 1085 | # And, assuming you want the base class to be an _AUTO lexicon, |
| 1086 | # as is discussed a few sections up: |
| 1087 | |
| 1088 | 1; |
| 1089 | |
| 1090 | =item * |
| 1091 | |
| 1092 | Create a class for the language your internal keys are in. Name |
| 1093 | the class after the language-tag for that language, in lowercase, |
| 1094 | with dashes changed to underscores. Assuming your project's first |
| 1095 | language is US English, you should call this Projname::L10N::en_us. |
| 1096 | It should consist minimally of: |
| 1097 | |
| 1098 | package Projname::L10N::en_us; |
| 1099 | use base qw(Projname::L10N); |
| 1100 | %Lexicon = ( |
| 1101 | '_AUTO' => 1, |
| 1102 | ); |
| 1103 | 1; |
| 1104 | |
| 1105 | (For the rest of this section, I'll assume that this "first |
| 1106 | language class" of Projname::L10N::en_us has |
| 1107 | _AUTO lexicon.) |
| 1108 | |
| 1109 | =item * |
| 1110 | |
| 1111 | Go and write your program. Everywhere in your program where |
| 1112 | you would say: |
| 1113 | |
| 1114 | print "Foobar $thing stuff\n"; |
| 1115 | |
| 1116 | instead do it thru maketext, using no variable interpolation in |
| 1117 | the key: |
| 1118 | |
| 1119 | print $lh->maketext("Foobar [_1] stuff\n", $thing); |
| 1120 | |
| 1121 | If you get tired of constantly saying C<print $lh-E<gt>maketext>, |
| 1122 | consider making a functional wrapper for it, like so: |
| 1123 | |
| 1124 | use Projname::L10N; |
| 1125 | use vars qw($lh); |
| 1126 | $lh = Projname::L10N->get_handle(...) || die "Language?"; |
| 1127 | sub pmt (@) { print( $lh->maketext(@_)) } |
| 1128 | # "pmt" is short for "Print MakeText" |
| 1129 | $Carp::Verbose = 1; |
| 1130 | # so if maketext fails, we see made the call to pmt |
| 1131 | |
| 1132 | Besides whole phrases meant for output, anything language-dependent |
| 1133 | should be put into the class Projname::L10N::en_us, |
| 1134 | whether as methods, or as lexicon entries -- this is discussed |
| 1135 | in the section "Entries in Each Lexicon", above. |
| 1136 | |
| 1137 | =item * |
| 1138 | |
| 1139 | Once the program is otherwise done, and once its localization for |
| 1140 | the first language works right (via the data and methods in |
| 1141 | Projname::L10N::en_us), you can get together the data for translation. |
| 1142 | If your first language lexicon isn't an _AUTO lexicon, then you already |
| 1143 | have all the messages explicitly in the lexicon (or else you'd be |
| 1144 | getting exceptions thrown when you call $lh->maketext to get |
| 1145 | messages that aren't in there). But if you were (advisedly) lazy and are |
| 1146 | using an _AUTO lexicon, then you've got to make a list of all the phrases |
| 1147 | that you've so far been letting _AUTO generate for you. There are very |
| 1148 | many ways to assemble such a list. The most straightforward is to simply |
| 1149 | grep the source for every occurrence of "maketext" (or calls |
| 1150 | to wrappers around it, like the above C<pmt> function), and to log the |
| 1151 | following phrase. |
| 1152 | |
| 1153 | =item * |
| 1154 | |
| 1155 | You may at this point want to consider whether the your base class |
| 1156 | (Projname::L10N) that all lexicons inherit from (Projname::L10N::en, |
| 1157 | Projname::L10N::es, etc.) should be an _AUTO lexicon. It may be true |
| 1158 | that in theory, all needed messages will be in each language class; |
| 1159 | but in the presumably unlikely or "impossible" case of lookup failure, |
| 1160 | you should consider whether your program should throw an exception, |
| 1161 | emit text in English (or whatever your project's first language is), |
| 1162 | or some more complex solution as described in the section |
| 1163 | "Controlling Lookup Failure", above. |
| 1164 | |
| 1165 | =item * |
| 1166 | |
| 1167 | Submit all messages/phrases/etc. to translators. |
| 1168 | |
| 1169 | (You may, in fact, want to start with localizing to I<one> other language |
| 1170 | at first, if you're not sure that you've property abstracted the |
| 1171 | language-dependent parts of your code.) |
| 1172 | |
| 1173 | Translators may request clarification of the situation in which a |
| 1174 | particular phrase is found. For example, in English we are entirely happy |
| 1175 | saying "I<n> files found", regardless of whether we mean "I looked for files, |
| 1176 | and found I<n> of them" or the rather distinct situation of "I looked for |
| 1177 | something else (like lines in files), and along the way I saw I<n> |
| 1178 | files." This may involve rethinking things that you thought quite clear: |
| 1179 | should "Edit" on a toolbar be a noun ("editing") or a verb ("to edit")? Is |
| 1180 | there already a conventionalized way to express that menu option, separate |
| 1181 | from the target language's normal word for "to edit"? |
| 1182 | |
| 1183 | In all cases where the very common phenomenon of quantification |
| 1184 | (saying "I<N> files", for B<any> value of N) |
| 1185 | is involved, each translator should make clear what dependencies the |
| 1186 | number causes in the sentence. In many cases, dependency is |
| 1187 | limited to words adjacent to the number, in places where you might |
| 1188 | expect them ("I found the-?PLURAL I<N> |
| 1189 | empty-?PLURAL directory-?PLURAL"), but in some cases there are |
| 1190 | unexpected dependencies ("I found-?PLURAL ..."!) as well as long-distance |
| 1191 | dependencies "The I<N> directory-?PLURAL could not be deleted-?PLURAL"!). |
| 1192 | |
| 1193 | Remind the translators to consider the case where N is 0: |
| 1194 | "0 files found" isn't exactly natural-sounding in any language, but it |
| 1195 | may be unacceptable in many -- or it may condition special |
| 1196 | kinds of agreement (similar to English "I didN'T find ANY files"). |
| 1197 | |
| 1198 | Remember to ask your translators about numeral formatting in their |
| 1199 | language, so that you can override the C<numf> method as |
| 1200 | appropriate. Typical variables in number formatting are: what to |
| 1201 | use as a decimal point (comma? period?); what to use as a thousands |
| 1202 | separator (space? nonbreaking space? comma? period? small |
| 1203 | middot? prime? apostrophe?); and even whether the so-called "thousands |
| 1204 | separator" is actually for every third digit -- I've heard reports of |
| 1205 | two hundred thousand being expressible as "2,00,000" for some Indian |
| 1206 | (Subcontinental) languages, besides the less surprising "S<200 000>", |
| 1207 | "200.000", "200,000", and "200'000". Also, using a set of numeral |
| 1208 | glyphs other than the usual ASCII "0"-"9" might be appreciated, as via |
| 1209 | C<tr/0-9/\x{0966}-\x{096F}/> for getting digits in Devanagari script |
| 1210 | (for Hindi, Konkani, others). |
| 1211 | |
| 1212 | The basic C<quant> method that Locale::Maketext provides should be |
| 1213 | good for many languages. For some languages, it might be useful |
| 1214 | to modify it (or its constituent C<numerate> method) |
| 1215 | to take a plural form in the two-argument call to C<quant> |
| 1216 | (as in "[quant,_1,files]") if |
| 1217 | it's all-around easier to infer the singular form from the plural, than |
| 1218 | to infer the plural form from the singular. |
| 1219 | |
| 1220 | But for other languages (as is discussed at length |
| 1221 | in L<Locale::Maketext::TPJ13|Locale::Maketext::TPJ13>), simple |
| 1222 | C<quant>/C<numerify> is not enough. For the particularly problematic |
| 1223 | Slavic languages, what you may need is a method which you provide |
| 1224 | with the number, the citation form of the noun to quantify, and |
| 1225 | the case and gender that the sentence's syntax projects onto that |
| 1226 | noun slot. The method would then be responsible for determining |
| 1227 | what grammatical number that numeral projects onto its noun phrase, |
| 1228 | and what case and gender it may override the normal case and gender |
| 1229 | with; and then it would look up the noun in a lexicon providing |
| 1230 | all needed inflected forms. |
| 1231 | |
| 1232 | =item * |
| 1233 | |
| 1234 | You may also wish to discuss with the translators the question of |
| 1235 | how to relate different subforms of the same language tag, |
| 1236 | considering how this reacts with C<get_handle>'s treatment of |
| 1237 | these. For example, if a user accepts interfaces in "en, fr", and |
| 1238 | you have interfaces available in "en-US" and "fr", what should |
| 1239 | they get? You may wish to resolve this by establishing that "en" |
| 1240 | and "en-US" are effectively synonymous, by having one class |
| 1241 | zero-derive from the other. |
| 1242 | |
| 1243 | For some languages this issue may never come up (Danish is rarely |
| 1244 | expressed as "da-DK", but instead is just "da"). And for other |
| 1245 | languages, the whole concept of a "generic" form may verge on |
| 1246 | being uselessly vague, particularly for interfaces involving voice |
| 1247 | media in forms of Arabic or Chinese. |
| 1248 | |
| 1249 | =item * |
| 1250 | |
| 1251 | Once you've localized your program/site/etc. for all desired |
| 1252 | languages, be sure to show the result (whether live, or via |
| 1253 | screenshots) to the translators. Once they approve, make every |
| 1254 | effort to have it then checked by at least one other speaker of |
| 1255 | that language. This holds true even when (or especially when) the |
| 1256 | translation is done by one of your own programmers. Some |
| 1257 | kinds of systems may be harder to find testers for than others, |
| 1258 | depending on the amount of domain-specific jargon and concepts |
| 1259 | involved -- it's easier to find people who can tell you whether |
| 1260 | they approve of your translation for "delete this message" in an |
| 1261 | email-via-Web interface, than to find people who can give you |
| 1262 | an informed opinion on your translation for "attribute value" |
| 1263 | in an XML query tool's interface. |
| 1264 | |
| 1265 | =back |
| 1266 | |
| 1267 | =head1 SEE ALSO |
| 1268 | |
| 1269 | I recommend reading all of these: |
| 1270 | |
| 1271 | L<Locale::Maketext::TPJ13|Locale::Maketext::TPJ13> -- my I<The Perl |
| 1272 | Journal> article about Maketext. It explains many important concepts |
| 1273 | underlying Locale::Maketext's design, and some insight into why |
| 1274 | Maketext is better than the plain old approach of just having |
| 1275 | message catalogs that are just databases of sprintf formats. |
| 1276 | |
| 1277 | L<File::Findgrep|File::Findgrep> is a sample application/module |
| 1278 | that uses Locale::Maketext to localize its messages. For a larger |
| 1279 | internationalized system, see also L<Apache::MP3>. |
| 1280 | |
| 1281 | L<I18N::LangTags|I18N::LangTags>. |
| 1282 | |
| 1283 | L<Win32::Locale|Win32::Locale>. |
| 1284 | |
| 1285 | RFC 3066, I<Tags for the Identification of Languages>, |
| 1286 | as at http://sunsite.dk/RFC/rfc/rfc3066.html |
| 1287 | |
| 1288 | RFC 2277, I<IETF Policy on Character Sets and Languages> |
| 1289 | is at http://sunsite.dk/RFC/rfc/rfc2277.html -- much of it is |
| 1290 | just things of interest to protocol designers, but it explains |
| 1291 | some basic concepts, like the distinction between locales and |
| 1292 | language-tags. |
| 1293 | |
| 1294 | The manual for GNU C<gettext>. The gettext dist is available in |
| 1295 | C<ftp://prep.ai.mit.edu/pub/gnu/> -- get |
| 1296 | a recent gettext tarball and look in its "doc/" directory, there's |
| 1297 | an easily browsable HTML version in there. The |
| 1298 | gettext documentation asks lots of questions worth thinking |
| 1299 | about, even if some of their answers are sometimes wonky, |
| 1300 | particularly where they start talking about pluralization. |
| 1301 | |
| 1302 | The Locale/Maketext.pm source. Obverse that the module is much |
| 1303 | shorter than its documentation! |
| 1304 | |
| 1305 | =head1 COPYRIGHT AND DISCLAIMER |
| 1306 | |
| 1307 | Copyright (c) 1999-2004 Sean M. Burke. All rights reserved. |
| 1308 | |
| 1309 | This library is free software; you can redistribute it and/or modify |
| 1310 | it under the same terms as Perl itself. |
| 1311 | |
| 1312 | This program is distributed in the hope that it will be useful, but |
| 1313 | without any warranty; without even the implied warranty of |
| 1314 | merchantability or fitness for a particular purpose. |
| 1315 | |
| 1316 | =head1 AUTHOR |
| 1317 | |
| 1318 | Sean M. Burke C<sburke@cpan.org> |
| 1319 | |
| 1320 | =cut |