[unix-history] / usr / src / usr.bin / diff / README

This directory contains the GNU DIFF and DIFF3 utilities, version 1.15.
See file COPYING for copying conditions.  To compile and install on
system V, you must edit the makefile according to comments therein.

Report bugs to bug-gnu-utils@prep.ai.mit.edu

Version 1.15 has the following new features; please see below for details.

   -L (+file-label) option
   -u (+unified) option
   -a and -m options for diff3
   Most output styles can represent incomplete input lines.
   `Text' is defined by ISO 8859.
   diff3 exit status 0 means success, 1 means overlaps, 2 means trouble.

 
This version of diff provides all the features of BSD's diff.
It has these additional features:

   An input file may end in a non-newline character.  If so, its last
   line is called an incomplete line and is distinguished on output
   from a full line.  In the default, -c, and -u output styles, an
   incomplete output line is followed by a diagnostic line that starts
   with \.  With -n, an incomplete line is output without a trailing
   newline.  Other output styles (-D, -e, -f) cannot represent an
   incomplete line, so they pretend that there was a newline, and -e and -f
   also print an error message.  For example, suppose F and G are one-byte
   files that contain just ``f'' and ``g'', respectively.

   Then ``diff F G'' outputs

	1c1
	< f
	\ No newline at end of file
	---
	> g
	\ No newline at end of file

   (The exact diagnostic message may differ, e.g. for non-English locales.)
   ``diff -n F G'' outputs the following without a trailing newline:

	d1 1
	a1 1
	g

   ``diff -e F G'' sends two diagnostics to stderr and the following to stdout:

	1c
	g
	.

   A file is considered to be text if its first characters are all in the
   ISO 8859 character set; BSD's diff uses Ascii.

   GNU DIFF has the following additional options:

   -a	Always treat files as text and compare them line-by-line,
	even if they do not appear to be text.

   -B	ignore changes that just insert or delete blank lines.

   -C #
	request -c format and specify number of context lines.

   -F regexp
	in context format, for each unit of differences, show some of
	the last preceding line that matches the specified regexp.

   -H	use heuristics to speed handling of large files that
	have numerous scattered small changes.  The algorithm becomes
        asymptotically linear for such files!
	
   -I regexp
	ignore changes that just insert or delete lines that
	match the specified regexp.

   -L label
	Use the specified label in file header lines output by the -c option.
	This option may be given zero, one, or two times,
	to affect neither label, just the first file's label, or both labels.
	A file's default label is its name, a tab, and its modification date.

   -N	in directory comparison, if a file is found in only one directory,
	treat it as present but empty in the other directory.

   -p	equivalent to -c -F'^[_a-zA-Z]'.  This is useful for C code
	because it shows which function each change is in.

   -T	print a tab rather than a space before the text of a line
	in normal or context format.  This causes the alignment
	of tabs in the line to look normal.

   -u[#]
	produce unified style output with # context lines (default 3).
	This style is like -c, but it is more compact because context
	lines are printed only once.  Lines from just the first file
	are marked '-'; lines from just the second file are marked '+'.

This version of diff3 has all of BSD diff3's features, with the following
additional features.

   An input file may end in a non-newline character.  With the -m option,
   an incomplete last line stays incomplete.  Other output styles treat
   incomplete lines like diff.

   The file name '-' denotes the standard input.  It can appear at most once.

   diff3 has the following additional options:

   -a	Always treat files as text and compare them line-by-line,
	even if they do not appear to be text.

   -i	Include 'w' and 'q' commands at the end of the output, to write out
	the changed file, thus emulating system V behavior.  One of the edit
	script options -e, -E, -x, -X, -3 must also be specified.

   -m	Apply the edit script to the first file and send the result to
	standard output.  Unlike piping diff3's output to ed(1), this works
	even for binary files and incomplete lines.  -E is assumed if no edit
	script option is specified.  This option is incompatible with -i.

   -L label
	Use the specified label for lines output by the -E and -X options,
	one of which must also be specified.  This option may be given zero,
	one, or two times; the first label marks <<<<<<< lines and the second
	marks >>>>>>> lines.  The default labels are the names of the first and
	third files on the command line.  Thus ``diff3 -L X -L Z -E A B C''
	acts like ``diff3 -E A B C'', except that the output looks like it
	came from files named X and Z rather than from files named A and C.

    Exit status 0 means success, 1 means overlaps were found and -E or -X was
    specified, and 2 means trouble.


GNU DIFF was written by Mike Haertel, David Hayes, Richard Stallman
and Len Tower.  The basic algorithm is described in: "An O(ND)
Difference Algorithm and its Variations", Eugene Myers, Algorithmica
Vol. 1 No. 2, 1986, p 251.

Many bugs were fixed by Paul Eggert.  The unified diff idea and format
are from Wayne Davison.

Suggested projects for improving GNU DIFF:

* Handle very large files by not keeping the entire text in core.

One way to do this is to scan the files sequentally to compute hash
codes of the lines and put the lines in equivalence classes based only
on hash code.  Then compare the files normally.  This will produce
some false matches.

Then scan the two files sequentially again, checking each match to see
whether it is real.  When a match is not real, mark both the
"matching" lines as changed.  Then build an edit script as usual.

The output routines would have to be changed to scan the files
sequentially looking for the text to print.
Commit	Line	Data
598c9c42 WJ	1	This directory contains the GNU DIFF and DIFF3 utilities, version 1.15.
	2	See file COPYING for copying conditions. To compile and install on
	3	system V, you must edit the makefile according to comments therein.
	4
	5	Report bugs to bug-gnu-utils@prep.ai.mit.edu
	6
	7	Version 1.15 has the following new features; please see below for details.
	8
	9	-L (+file-label) option
	10	-u (+unified) option
	11	-a and -m options for diff3
	12	Most output styles can represent incomplete input lines.
	13	`Text' is defined by ISO 8859.
	14	diff3 exit status 0 means success, 1 means overlaps, 2 means trouble.
	15
	16
	17	This version of diff provides all the features of BSD's diff.
	18	It has these additional features:
	19
	20	An input file may end in a non-newline character. If so, its last
	21	line is called an incomplete line and is distinguished on output
	22	from a full line. In the default, -c, and -u output styles, an
	23	incomplete output line is followed by a diagnostic line that starts
	24	with \. With -n, an incomplete line is output without a trailing
	25	newline. Other output styles (-D, -e, -f) cannot represent an
	26	incomplete line, so they pretend that there was a newline, and -e and -f
	27	also print an error message. For example, suppose F and G are one-byte
	28	files that contain just ``f'' and ``g'', respectively.
	29
	30	Then ``diff F G'' outputs
	31
	32	1c1
	33	< f
	34	\ No newline at end of file
	35	---
	36	> g
	37	\ No newline at end of file
	38
	39	(The exact diagnostic message may differ, e.g. for non-English locales.)
	40	``diff -n F G'' outputs the following without a trailing newline:
	41
	42	d1 1
	43	a1 1
	44	g
	45
	46	``diff -e F G'' sends two diagnostics to stderr and the following to stdout:
	47
	48	1c
	49	g
	50	.
	51
	52	A file is considered to be text if its first characters are all in the
	53	ISO 8859 character set; BSD's diff uses Ascii.
	54
	55	GNU DIFF has the following additional options:
	56
	57	-a Always treat files as text and compare them line-by-line,
	58	even if they do not appear to be text.
	59
	60	-B ignore changes that just insert or delete blank lines.
	61
	62	-C #
	63	request -c format and specify number of context lines.
	64
65	-F regexp
66	in context format, for each unit of differences, show some of
67	the last preceding line that matches the specified regexp.
68
69	-H use heuristics to speed handling of large files that
70	have numerous scattered small changes. The algorithm becomes
71	asymptotically linear for such files!
72
73	-I regexp
74	ignore changes that just insert or delete lines that
75	match the specified regexp.
76
77	-L label
78	Use the specified label in file header lines output by the -c option.
79	This option may be given zero, one, or two times,
80	to affect neither label, just the first file's label, or both labels.
81	A file's default label is its name, a tab, and its modification date.
82
83	-N in directory comparison, if a file is found in only one directory,
84	treat it as present but empty in the other directory.
85
86	-p equivalent to -c -F'^[_a-zA-Z]'. This is useful for C code
87	because it shows which function each change is in.
88
89	-T print a tab rather than a space before the text of a line
90	in normal or context format. This causes the alignment
91	of tabs in the line to look normal.
92
93	-u[#]
94	produce unified style output with # context lines (default 3).
95	This style is like -c, but it is more compact because context
96	lines are printed only once. Lines from just the first file
97	are marked '-'; lines from just the second file are marked '+'.
98
99	This version of diff3 has all of BSD diff3's features, with the following
100	additional features.
101
102	An input file may end in a non-newline character. With the -m option,
103	an incomplete last line stays incomplete. Other output styles treat
104	incomplete lines like diff.
105
106	The file name '-' denotes the standard input. It can appear at most once.
107
108	diff3 has the following additional options:
109
110	-a Always treat files as text and compare them line-by-line,
111	even if they do not appear to be text.
112
113	-i Include 'w' and 'q' commands at the end of the output, to write out
114	the changed file, thus emulating system V behavior. One of the edit
115	script options -e, -E, -x, -X, -3 must also be specified.
116
117	-m Apply the edit script to the first file and send the result to
118	standard output. Unlike piping diff3's output to ed(1), this works
119	even for binary files and incomplete lines. -E is assumed if no edit
120	script option is specified. This option is incompatible with -i.
121
122	-L label
123	Use the specified label for lines output by the -E and -X options,
124	one of which must also be specified. This option may be given zero,
125	one, or two times; the first label marks <<<<<<< lines and the second
126	marks >>>>>>> lines. The default labels are the names of the first and
127	third files on the command line. Thus ``diff3 -L X -L Z -E A B C''
128	acts like ``diff3 -E A B C'', except that the output looks like it
129	came from files named X and Z rather than from files named A and C.
130
131	Exit status 0 means success, 1 means overlaps were found and -E or -X was
132	specified, and 2 means trouble.
133
134
135
136	GNU DIFF was written by Mike Haertel, David Hayes, Richard Stallman
137	and Len Tower. The basic algorithm is described in: "An O(ND)
138	Difference Algorithm and its Variations", Eugene Myers, Algorithmica
139	Vol. 1 No. 2, 1986, p 251.
140
141	Many bugs were fixed by Paul Eggert. The unified diff idea and format
142	are from Wayne Davison.
143
144	Suggested projects for improving GNU DIFF:
145
146	* Handle very large files by not keeping the entire text in core.
147
148	One way to do this is to scan the files sequentally to compute hash
149	codes of the lines and put the lines in equivalence classes based only
150	on hash code. Then compare the files normally. This will produce
151	some false matches.
152
153	Then scan the two files sequentially again, checking each match to see
154	whether it is real. When a match is not real, mark both the
155	"matching" lines as changed. Then build an edit script as usual.
156
157	The output routines would have to be changed to scan the files
158	sequentially looking for the text to print.