.\" Automatically generated by Pod::Man v1.37, Pod::Parser v1.32
.\" ========================================================================
.de Sh \" Subsection heading
.de Sp \" Vertical space (when we can't use .PP)
.de Vb \" Begin verbatim text
.de Ve \" End verbatim text
.\" Set up some character translations and predefined strings. \*(-- will
.\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left
.\" double quote, and \*(R" will give a right double quote. | will give a
.\" real vertical bar. \*(C+ will give a nicer C++. Capital omega is used to
.\" do unbreakable dashes and therefore won't be available. \*(C` and \*(C'
.\" expand to `' in nroff, nothing in troff, for use with C<>.
.ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p'
. if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch
. if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch
.\" If the F register is turned on, we'll generate index entries on stderr for
.\" titles (.TH), headers (.SH), subsections (.Sh), items (.Ip), and index
.\" entries marked with X<> in POD. Of course, you'll have to process the
.\" output yourself in some meaningful fashion.
. tm Index:\\$1\t\\n%\t"\\$2"
.\" For nroff, turn off justification. Always turn off hyphenation; it makes
.\" way too many mistakes in technical documents.
.\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2).
.\" Fear. Run. Save yourself. No user-serviceable parts.
. \" fudge factors for nroff and troff
. ds #H ((1u-(\\\\n(.fu%2u))*.13m)
. \" simple accents for nroff and troff
. ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u"
. ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u'
. ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u'
. ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u'
. ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u'
. ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u'
. \" troff and (daisy-wheel) nroff accents
.ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V'
.ds 8 \h'\*(#H'\(*b\h'-\*(#H'
.ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#]
.ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H'
.ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u'
.ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#]
.ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#]
.ds ae a\h'-(\w'a'u*4/10)'e
.ds Ae A\h'-(\w'A'u*4/10)'E
. \" corrections for vroff
.if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u'
.if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u'
. \" for low resolution devices (crt and lpr)
.if \n(.H>23 .if \n(.V>19 \
.\" ========================================================================
.IX Title "Text::Soundex 3"
.TH Text::Soundex 3 "2001-09-21" "perl v5.8.8" "Perl Programmers Reference Guide"
Text::Soundex \- Implementation of the Soundex Algorithm as Described by Knuth
\& $code = soundex $string; # get soundex code for a string
\& @codes = soundex @list; # get list of codes for list of strings
\& # set value to be returned for strings without soundex code
\& $soundex_nocode = 'Z000';
This module implements the soundex algorithm as described by Donald Knuth
in Volume 3 of \fBThe Art of Computer Programming\fR. The algorithm is
intended to hash words (in particular surnames) into a small space using a
simple model which approximates the sound of the word when spoken by an English
speaker. Each word is reduced to a four character string, the first
character being an upper case letter and the remaining three being digits.
If there is no soundex code representation for a string then the value of
\&\f(CW$soundex_nocode\fR is returned. This is initially set to \f(CW\*(C`undef\*(C'\fR, but
many people seem to prefer an \fIunlikely\fR value like \f(CW\*(C`Z000\*(C'\fR
(how unlikely this is depends on the data set being dealt with.) Any value
can be assigned to \f(CW$soundex_nocode\fR.
In scalar context \f(CW\*(C`soundex\*(C'\fR returns the soundex code of its first
argument, and in list context a list is returned in which each element is the
soundex code for the corresponding argument passed to \f(CW\*(C`soundex\*(C'\fR e.g.
\& @codes = soundex qw(Mike Stok);
leaves \f(CW@codes\fR containing \f(CW\*(C`('M200', 'S320')\*(C'\fR.
Knuth's examples of various names and the soundex codes they map to
\& Hilbert, Heilbronn -> H416
\& Lukasiewicz, Lissajous -> L222
\& $code = soundex 'Knuth'; # $code contains 'K530'
\& @list = soundex qw(Lloyd Gauss); # @list contains 'L300', 'G200'
As the soundex algorithm was originally used a \fBlong\fR time ago in the \s-1US\s0
it considers only the English alphabet and pronunciation.
As it is mapping a large space (arbitrary length strings) onto a small
space (single letter plus 3 digits) no inference can be made about the
similarity of two strings which end up with the same soundex code. For
example, both \f(CW\*(C`Hilbert\*(C'\fR and \f(CW\*(C`Heilbronn\*(C'\fR end up with a soundex code
of \f(CW\*(C`H416\*(C'\fR.
This code was implemented by Mike Stok (\f(CW\*(C`stok@cybercom.net\*(C'\fR) from the
description given by Knuth. Ian Phillipps (\f(CW\*(C`ian@pipex.net\*(C'\fR) and Rich Pinder
(\f(CW\*(C`rpinder@hsc.usc.edu\*(C'\fR) supplied ideas and spotted mistakes.