[unix-history] / usr / src / lib / libc / gen / vis.3

.\" Copyright (c) 1989 The Regents of the University of California.
.\" All rights reserved.
.\"
.\" Redistribution and use in source and binary forms are permitted
.\" provided that the above copyright notice and this paragraph are
.\" duplicated in all such forms and that any documentation,
.\" advertising materials, and other materials related to such
.\" distribution and use acknowledge that the software was developed
.\" by the University of California, Berkeley.  The name of the
.\" University may not be used to endorse or promote products derived
.\" from this software without specific prior written permission.
.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR
.\" IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
.\"
.\"	@(#)vis.3	5.2 (Berkeley) %G%
.\"
.TH <CENCODE> <3> ""
.UC 7
.AT 3
.SH NAME
cencode, cdecode \- encode (decode) non-printing characters
.SH SYNOPSIS
.nf
.B #include <cencode.h>
.PP
.B char *cencode(c, cflag)
.B char c;
.B int flag;
.PP
.B cdecode(c, cp, dflag)
.B char c, *cp;
.B int flag;
.SH DESCRIPTION
\fICencode\fP converts a non-printing character into a printable,
invertable representation; \fIcdecode\fP inverts
from that representation back to the original character.
Both functions pass through printable characters, and
are useful for filtering a stream of characters
to and from a visual representation.
.PP
By default, \fIcencode\fP considers isgraph(c), space, tab, and
newline as printable characters.  Setting CENC_WHITE in
cflag causes space, tab, and newline to be
encoded as well.
.PP
There are 3 forms of representation, and all
three can be requested, independent of each other, 
since some encode only a subset
of the non-printable characters.  
All
forms use the backslash character to introduce the visual
sequence; two backslashs are used to represent a
real backslash.  The following lists the name of the form
(specified in the cflag), and a description:
.TP
.I CENC_CTYPE
Use C-style backslash sequences where possible.  The following
sequences are used to represent the indicated character:
.nf

\\n - NL  (012)
\\r - CR  (015)	
\\b - BS  (010)
\\a - BEL (007)
\\v - VT  (013)
\\t - HT  (011)
\\f - NP  (014)
\\000 - NUL (000)

.fi
These are the only characters that are converted using CDEC_CTYPE.
The more familiar abbreviation of \\0 for NULL cannot be used since
it could be confused as another octal number if the sequence
is laid ahead of other octal digits.
.PP
.TP
.I CENC_GRAPHIC
Use an M to represent meta characters (chars with the 8th bit set),
and use hat (^) to represent control characters (iscntrl(c)).  The
following forms are possible:
.nf

\\^C  - Represents control character 'C'.  Spans 
	  characters 000 through 037, and 0177 (as \\^?).
\\M-C - Represents character 'C' with the 8th bit set.  
	  Spans characters 0240 (241 if CENC_WHITE is set)
	  through 0376.
\\M^C - Represents control character 'C' with the 8th 
	  bit set.  Spans characters 0200 through 0237, 
	  and 0377 (as \\M^?).

.fi
The only characters that cannot be displayed using CDEC_GRAPHIC
are space and meta-space, and only when CENC_WHITE is set.
.TP
.I CENC_OCTAL
Use a three digit octal sequence.  The form is:
.nf

\\ddd

.fi
where d represents an octal digit.  All non-printing characters
can be displayed in this form.
.PP
\fICencode\fP returns a pointer to a string that contains the
printable representation of the character passed in c.  If the character
could not be encoded (because none of the selected formats can
encode that character), it is placed in the returned
string un-encoded.  Note that if NULL is not encoded, it is placed
in the string as two nulls.  If the caller expects to encounter
this situation, it suffices to always extract one character from
the returned string before checking for NULL.  If CDEC_OCTAL
is selected, in addition to any other formats, this situation
can never arrise.  Also, calling \fIcencode\fP with no requested formats
results in no encoding being done; however, backslashes are
still doubled.
.PP
Using \fIcdecode\fP to decode previously encoded data is a little
trickier.  Essentially, characters are passed to \fIcdecode\fP
until the decoder recognizes a character to return.  There are
five return codes which need to be handled:
.TP
.I CDEC_NEEDMORE
The decoder is not done recognizing a control sequence; pass it
another character in c.
.TP
.I CDEC_OK
A character was recognized and has been placed in *cp.
.TP
.I CDEC_OKPUSH
A character was recognized and has been placed in *cp; however,
the character that was just passed in c is not yet needed.
When processing a stream of characters, the current character
should be used again.
.TP
.I CDEC_NOCHAR
A sequence which represents no character was detected.
.TP
.I CDEC_SYNBAD
An unrecognized backslash sequence was detected.  The decoder
was automatically reset to a normal state.  All characters since
the last un-escaped backslash character constitute the 
unrecognized sequence.
.PP
When the caller is finished feeding characters to \fIcdecode\fP,
it
should be called one last time with dflag set to CDEC_END.  This will extract
any remaining character.
A sample code fragment
is given to illustrate using cdecode:
.nf

	char nc;
	while ((c = getchar()) != EOF) {
		again:
		switch(cdecode((char)c, &nc, 0)) {
		case CDEC_NEEDMORE:
		case CDEC_NOCHAR:
			break;
		case CDEC_OK:
			putchar(nc);
			break;
		case CDEC_OKPUSH:
			putchar(nc);
			goto again;
		case CDEC_SYNBAD:
			fprintf(stderr, "Bad sequence\n");
			exit(1);
		}
	}
	if (cdecode((char)0, &nc, CDEC_END) == CDEC_OK)
		putchar(nc);

.fi
.SH "SEE ALSO"
vis(1)
Commit	Line	Data
c2e56add MT	1	.\" Copyright (c) 1989 The Regents of the University of California.
	2	.\" All rights reserved.
	3	.\"
	4	.\" Redistribution and use in source and binary forms are permitted
	5	.\" provided that the above copyright notice and this paragraph are
	6	.\" duplicated in all such forms and that any documentation,
	7	.\" advertising materials, and other materials related to such
	8	.\" distribution and use acknowledge that the software was developed
	9	.\" by the University of California, Berkeley. The name of the
	10	.\" University may not be used to endorse or promote products derived
	11	.\" from this software without specific prior written permission.
	12	.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR
	13	.\" IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
	14	.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
	15	.\"
6fae5e3d	16	.\" @(#)vis.3 5.2 (Berkeley) %G%
c2e56add MT	17	.\"
	18	.TH <CENCODE> <3> ""
	19	.UC 7
	20	.AT 3
	21	.SH NAME
	22	cencode, cdecode \- encode (decode) non-printing characters
	23	.SH SYNOPSIS
	24	.nf
	25	.B #include <cencode.h>
	26	.PP
	27	.B char *cencode(c, cflag)
	28	.B char c;
	29	.B int flag;
	30	.PP
	31	.B cdecode(c, cp, dflag)
	32	.B char c, *cp;
	33	.B int flag;
	34	.SH DESCRIPTION
	35	\fICencode\fP converts a non-printing character into a printable,
	36	invertable representation; \fIcdecode\fP inverts
	37	from that representation back to the original character.
	38	Both functions pass through printable characters, and
	39	are useful for filtering a stream of characters
	40	to and from a visual representation.
	41	.PP
	42	By default, \fIcencode\fP considers isgraph(c), space, tab, and
	43	newline as printable characters. Setting CENC_WHITE in
	44	cflag causes space, tab, and newline to be
	45	encoded as well.
	46	.PP
	47	There are 3 forms of representation, and all
	48	three can be requested, independent of each other,
	49	since some encode only a subset
	50	of the non-printable characters.
	51	All
	52	forms use the backslash character to introduce the visual
	53	sequence; two backslashs are used to represent a
	54	real backslash. The following lists the name of the form
	55	(specified in the cflag), and a description:
	56	.TP
	57	.I CENC_CTYPE
	58	Use C-style backslash sequences where possible. The following
	59	sequences are used to represent the indicated character:
	60	.nf
	61
	62	\\n - NL (012)
	63	\\r - CR (015)
	64	\\b - BS (010)
	65	\\a - BEL (007)
	66	\\v - VT (013)
	67	\\t - HT (011)
	68	\\f - NP (014)
	69	\\000 - NUL (000)
	70
	71	.fi
	72	These are the only characters that are converted using CDEC_CTYPE.
	73	The more familiar abbreviation of \\0 for NULL cannot be used since
	74	it could be confused as another octal number if the sequence
	75	is laid ahead of other octal digits.
	76	.PP
	77	.TP
	78	.I CENC_GRAPHIC
	79	Use an M to represent meta characters (chars with the 8th bit set),
	80	and use hat (^) to represent control characters (iscntrl(c)). The
81	following forms are possible:
82	.nf
83
84	\\^C - Represents control character 'C'. Spans
85	characters 000 through 037, and 0177 (as \\^?).
86	\\M-C - Represents character 'C' with the 8th bit set.
87	Spans characters 0240 (241 if CENC_WHITE is set)
88	through 0376.
89	\\M^C - Represents control character 'C' with the 8th
90	bit set. Spans characters 0200 through 0237,
91	and 0377 (as \\M^?).
92
93	.fi
94	The only characters that cannot be displayed using CDEC_GRAPHIC
95	are space and meta-space, and only when CENC_WHITE is set.
96	.TP
97	.I CENC_OCTAL
98	Use a three digit octal sequence. The form is:
99	.nf
100
101	\\ddd
102
103	.fi
104	where d represents an octal digit. All non-printing characters
105	can be displayed in this form.
106	.PP
107	\fICencode\fP returns a pointer to a string that contains the
108	printable representation of the character passed in c. If the character
109	could not be encoded (because none of the selected formats can
110	encode that character), it is placed in the returned
111	string un-encoded. Note that if NULL is not encoded, it is placed
112	in the string as two nulls. If the caller expects to encounter
113	this situation, it suffices to always extract one character from
114	the returned string before checking for NULL. If CDEC_OCTAL
115	is selected, in addition to any other formats, this situation
116	can never arrise. Also, calling \fIcencode\fP with no requested formats
117	results in no encoding being done; however, backslashes are
118	still doubled.
119	.PP
120	Using \fIcdecode\fP to decode previously encoded data is a little
121	trickier. Essentially, characters are passed to \fIcdecode\fP
122	until the decoder recognizes a character to return. There are
123	five return codes which need to be handled:
124	.TP
125	.I CDEC_NEEDMORE
126	The decoder is not done recognizing a control sequence; pass it
127	another character in c.
128	.TP
129	.I CDEC_OK
6fae5e3d	130	A character was recognized and has been placed in *cp.
c2e56add MT	131	.TP
c2e56add MT	132	.I CDEC_OKPUSH
6fae5e3d MT	133	A character was recognized and has been placed in *cp; however,
6fae5e3d MT	134	the character that was just passed in c is not yet needed.
c2e56add MT	135	When processing a stream of characters, the current character
	136	should be used again.
	137	.TP
	138	.I CDEC_NOCHAR
	139	A sequence which represents no character was detected.
	140	.TP
	141	.I CDEC_SYNBAD
	142	An unrecognized backslash sequence was detected. The decoder
	143	was automatically reset to a normal state. All characters since
	144	the last un-escaped backslash character constitute the
	145	unrecognized sequence.
	146	.PP
	147	When the caller is finished feeding characters to \fIcdecode\fP,
	148	it
	149	should be called one last time with dflag set to CDEC_END. This will extract
	150	any remaining character.
	151	A sample code fragment
	152	is given to illustrate using cdecode:
	153	.nf
	154
	155	char nc;
	156	while ((c = getchar()) != EOF) {
	157	again:
	158	switch(cdecode((char)c, &nc, 0)) {
	159	case CDEC_NEEDMORE:
	160	case CDEC_NOCHAR:
	161	break;
	162	case CDEC_OK:
	163	putchar(nc);
	164	break;
	165	case CDEC_OKPUSH:
	166	putchar(nc);
	167	goto again;
	168	case CDEC_SYNBAD:
	169	fprintf(stderr, "Bad sequence\n");
	170	exit(1);
	171	}
	172	}
	173	if (cdecode((char)0, &nc, CDEC_END) == CDEC_OK)
	174	putchar(nc);
	175
	176	.fi
	177	.SH "SEE ALSO"
	178	vis(1)