usr/src/lib/libc/gen/vis.3

.\" Copyright (c) 1989 The Regents of the University of California.
.\" All rights reserved.
.\"
.\" Redistribution and use in source and binary forms are permitted
.\" provided that the above copyright notice and this paragraph are
.\" duplicated in all such forms and that any documentation,
.\" advertising materials, and other materials related to such
.\" distribution and use acknowledge that the software was developed
.\" by the University of California, Berkeley.  The name of the
.\" University may not be used to endorse or promote products derived
.\" from this software without specific prior written permission.
.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR
.\" IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
.\"
.\"     @(#)vis.3       5.2 (Berkeley) %G%
.\"
.TH <CENCODE> <3> ""
.UC 7
.AT 3
.SH NAME
cencode, cdecode \- encode (decode) non-printing characters
.SH SYNOPSIS
.nf
.B #include <cencode.h>
.PP
.B char *cencode(c, cflag)
.B char c;
.B int flag;
.PP
.B cdecode(c, cp, dflag)
.B char c, *cp;
.B int flag;
.SH DESCRIPTION
\fICencode\fP converts a non-printing character into a printable,
invertable representation; \fIcdecode\fP inverts
from that representation back to the original character.
Both functions pass through printable characters, and
are useful for filtering a stream of characters
to and from a visual representation.
.PP
By default, \fIcencode\fP considers isgraph(c), space, tab, and
newline as printable characters.  Setting CENC_WHITE in
cflag causes space, tab, and newline to be
encoded as well.
.PP
There are 3 forms of representation, and all
three can be requested, independent of each other,
since some encode only a subset
of the non-printable characters.
All
forms use the backslash character to introduce the visual
sequence; two backslashs are used to represent a
real backslash.  The following lists the name of the form
(specified in the cflag), and a description:
.TP
.I CENC_CTYPE
Use C-style backslash sequences where possible.  The following
sequences are used to represent the indicated character:
.nf

\\n - NL  (012)
\\r - CR  (015)
\\b - BS  (010)
\\a - BEL (007)
\\v - VT  (013)
\\t - HT  (011)
\\f - NP  (014)
\\000 - NUL (000)

.fi
These are the only characters that are converted using CDEC_CTYPE.
The more familiar abbreviation of \\0 for NULL cannot be used since
it could be confused as another octal number if the sequence
is laid ahead of other octal digits.
.PP
.TP
.I CENC_GRAPHIC
Use an M to represent meta characters (chars with the 8th bit set),
and use hat (^) to represent control characters (iscntrl(c)).  The
following forms are possible:
.nf

\\^C  - Represents control character 'C'.  Spans
          characters 000 through 037, and 0177 (as \\^?).
\\M-C - Represents character 'C' with the 8th bit set.
          Spans characters 0240 (241 if CENC_WHITE is set)
          through 0376.
\\M^C - Represents control character 'C' with the 8th
          bit set.  Spans characters 0200 through 0237,
          and 0377 (as \\M^?).

.fi
The only characters that cannot be displayed using CDEC_GRAPHIC
are space and meta-space, and only when CENC_WHITE is set.
.TP
.I CENC_OCTAL
Use a three digit octal sequence.  The form is:
.nf

\\ddd

.fi
where d represents an octal digit.  All non-printing characters
can be displayed in this form.
.PP
\fICencode\fP returns a pointer to a string that contains the
printable representation of the character passed in c.  If the character
could not be encoded (because none of the selected formats can
encode that character), it is placed in the returned
string un-encoded.  Note that if NULL is not encoded, it is placed
in the string as two nulls.  If the caller expects to encounter
this situation, it suffices to always extract one character from
the returned string before checking for NULL.  If CDEC_OCTAL
is selected, in addition to any other formats, this situation
can never arrise.  Also, calling \fIcencode\fP with no requested formats
results in no encoding being done; however, backslashes are
still doubled.
.PP
Using \fIcdecode\fP to decode previously encoded data is a little
trickier.  Essentially, characters are passed to \fIcdecode\fP
until the decoder recognizes a character to return.  There are
five return codes which need to be handled:
.TP
.I CDEC_NEEDMORE
The decoder is not done recognizing a control sequence; pass it
another character in c.
.TP
.I CDEC_OK
A character was recognized and has been placed in *cp.
.TP
.I CDEC_OKPUSH
A character was recognized and has been placed in *cp; however,
the character that was just passed in c is not yet needed.
When processing a stream of characters, the current character
should be used again.
.TP
.I CDEC_NOCHAR
A sequence which represents no character was detected.
.TP
.I CDEC_SYNBAD
An unrecognized backslash sequence was detected.  The decoder
was automatically reset to a normal state.  All characters since
the last un-escaped backslash character constitute the
unrecognized sequence.
.PP
When the caller is finished feeding characters to \fIcdecode\fP,
it
should be called one last time with dflag set to CDEC_END.  This will extract
any remaining character.
A sample code fragment
is given to illustrate using cdecode:
.nf

        char nc;
        while ((c = getchar()) != EOF) {
                again:
                switch(cdecode((char)c, &nc, 0)) {
                case CDEC_NEEDMORE:
                case CDEC_NOCHAR:
                        break;
                case CDEC_OK:
                        putchar(nc);
                        break;
                case CDEC_OKPUSH:
                        putchar(nc);
                        goto again;
                case CDEC_SYNBAD:
                        fprintf(stderr, "Bad sequence\n");
                        exit(1);
                }
        }
        if (cdecode((char)0, &nc, CDEC_END) == CDEC_OK)
                putchar(nc);

.fi
.SH "SEE ALSO"
vis(1)