Commit | Line | Data |
---|---|---|
3013fe88 NW |
1 | /* inflate.c -- Not copyrighted 1992 by Mark Adler |
2 | version c10p1, 10 January 1993 */ | |
3 | ||
4 | /* You can do whatever you like with this source file, though I would | |
5 | prefer that if you modify it and redistribute it that you include | |
6 | comments to that effect with your name and the date. Thank you. | |
7 | [The history has been moved to the file ChangeLog.] | |
8 | */ | |
9 | ||
10 | /* | |
11 | Inflate deflated (PKZIP's method 8 compressed) data. The compression | |
12 | method searches for as much of the current string of bytes (up to a | |
13 | length of 258) in the previous 32K bytes. If it doesn't find any | |
14 | matches (of at least length 3), it codes the next byte. Otherwise, it | |
15 | codes the length of the matched string and its distance backwards from | |
16 | the current position. There is a single Huffman code that codes both | |
17 | single bytes (called "literals") and match lengths. A second Huffman | |
18 | code codes the distance information, which follows a length code. Each | |
19 | length or distance code actually represents a base value and a number | |
20 | of "extra" (sometimes zero) bits to get to add to the base value. At | |
21 | the end of each deflated block is a special end-of-block (EOB) literal/ | |
22 | length code. The decoding process is basically: get a literal/length | |
23 | code; if EOB then done; if a literal, emit the decoded byte; if a | |
24 | length then get the distance and emit the referred-to bytes from the | |
25 | sliding window of previously emitted data. | |
26 | ||
27 | There are (currently) three kinds of inflate blocks: stored, fixed, and | |
28 | dynamic. The compressor deals with some chunk of data at a time, and | |
29 | decides which method to use on a chunk-by-chunk basis. A chunk might | |
30 | typically be 32K or 64K. If the chunk is uncompressible, then the | |
31 | "stored" method is used. In this case, the bytes are simply stored as | |
32 | is, eight bits per byte, with none of the above coding. The bytes are | |
33 | preceded by a count, since there is no longer an EOB code. | |
34 | ||
35 | If the data is compressible, then either the fixed or dynamic methods | |
36 | are used. In the dynamic method, the compressed data is preceded by | |
37 | an encoding of the literal/length and distance Huffman codes that are | |
38 | to be used to decode this block. The representation is itself Huffman | |
39 | coded, and so is preceded by a description of that code. These code | |
40 | descriptions take up a little space, and so for small blocks, there is | |
41 | a predefined set of codes, called the fixed codes. The fixed method is | |
42 | used if the block codes up smaller that way (usually for quite small | |
43 | chunks), otherwise the dynamic method is used. In the latter case, the | |
44 | codes are customized to the probabilities in the current block, and so | |
45 | can code it much better than the pre-determined fixed codes. | |
46 | ||
47 | The Huffman codes themselves are decoded using a mutli-level table | |
48 | lookup, in order to maximize the speed of decoding plus the speed of | |
49 | building the decoding tables. See the comments below that precede the | |
50 | lbits and dbits tuning parameters. | |
51 | */ | |
52 | ||
53 | ||
54 | /* | |
55 | Notes beyond the 1.93a appnote.txt: | |
56 | ||
57 | 1. Distance pointers never point before the beginning of the output | |
58 | stream. | |
59 | 2. Distance pointers can point back across blocks, up to 32k away. | |
60 | 3. There is an implied maximum of 7 bits for the bit length table and | |
61 | 15 bits for the actual data. | |
62 | 4. If only one code exists, then it is encoded using one bit. (Zero | |
63 | would be more efficient, but perhaps a little confusing.) If two | |
64 | codes exist, they are coded using one bit each (0 and 1). | |
65 | 5. There is no way of sending zero distance codes--a dummy must be | |
66 | sent if there are none. (History: a pre 2.0 version of PKZIP would | |
67 | store blocks with no distance codes, but this was discovered to be | |
68 | too harsh a criterion.) Valid only for 1.93a. 2.04c does allow | |
69 | zero distance codes, which is sent as one code of zero bits in | |
70 | length. | |
71 | 6. There are up to 286 literal/length codes. Code 256 represents the | |
72 | end-of-block. Note however that the static length tree defines | |
73 | 288 codes just to fill out the Huffman codes. Codes 286 and 287 | |
74 | cannot be used though, since there is no length base or extra bits | |
75 | defined for them. Similarly, there are up to 30 distance codes. | |
76 | However, static trees define 32 codes (all 5 bits) to fill out the | |
77 | Huffman codes, but the last two had better not show up in the data. | |
78 | 7. Unzip can check dynamic Huffman blocks for complete code sets. | |
79 | The exception is that a single code would not be complete (see #4). | |
80 | 8. The five bits following the block type is really the number of | |
81 | literal codes sent minus 257. | |
82 | 9. Length codes 8,16,16 are interpreted as 13 length codes of 8 bits | |
83 | (1+6+6). Therefore, to output three times the length, you output | |
84 | three codes (1+1+1), whereas to output four times the same length, | |
85 | you only need two codes (1+3). Hmm. | |
86 | 10. In the tree reconstruction algorithm, Code = Code + Increment | |
87 | only if BitLength(i) is not zero. (Pretty obvious.) | |
88 | 11. Correction: 4 Bits: # of Bit Length codes - 4 (4 - 19) | |
89 | 12. Note: length code 284 can represent 227-258, but length code 285 | |
90 | really is 258. The last length deserves its own, short code | |
91 | since it gets used a lot in very redundant files. The length | |
92 | 258 is special since 258 - 3 (the min match length) is 255. | |
93 | 13. The literal/length and distance code bit lengths are read as a | |
94 | single stream of lengths. It is possible (and advantageous) for | |
95 | a repeat code (16, 17, or 18) to go across the boundary between | |
96 | the two sets of lengths. | |
97 | */ | |
98 | ||
99 | #ifndef lint | |
caed0dfe | 100 | static char rcsid[] = "$Id: inflate.c,v 0.14 1993/06/10 13:27:04 jloup Exp $"; |
3013fe88 NW |
101 | #endif |
102 | ||
3013fe88 NW |
103 | #include <sys/types.h> |
104 | ||
105 | #include "tailor.h" | |
106 | ||
107 | #if defined(STDC_HEADERS) || !defined(NO_STDLIB_H) | |
108 | # include <stdlib.h> | |
109 | #endif | |
110 | ||
111 | #include "gzip.h" | |
112 | #define slide window | |
113 | ||
114 | /* Huffman code lookup table entry--this entry is four bytes for machines | |
115 | that have 16-bit pointers (e.g. PC's in the small or medium model). | |
116 | Valid extra bits are 0..13. e == 15 is EOB (end of block), e == 16 | |
117 | means that v is a literal, 16 < e < 32 means that v is a pointer to | |
118 | the next table, which codes e - 16 bits, and lastly e == 99 indicates | |
119 | an unused code. If a code with e == 99 is looked up, this implies an | |
120 | error in the data. */ | |
121 | struct huft { | |
122 | uch e; /* number of extra bits or operation */ | |
123 | uch b; /* number of bits in this code or subcode */ | |
124 | union { | |
125 | ush n; /* literal, length base, or distance base */ | |
126 | struct huft *t; /* pointer to next level of table */ | |
127 | } v; | |
128 | }; | |
129 | ||
130 | ||
131 | /* Function prototypes */ | |
132 | int huft_build OF((unsigned *, unsigned, unsigned, ush *, ush *, | |
133 | struct huft **, int *)); | |
134 | int huft_free OF((struct huft *)); | |
135 | int inflate_codes OF((struct huft *, struct huft *, int, int)); | |
136 | int inflate_stored OF((void)); | |
137 | int inflate_fixed OF((void)); | |
138 | int inflate_dynamic OF((void)); | |
139 | int inflate_block OF((int *)); | |
140 | int inflate OF((void)); | |
141 | ||
142 | ||
143 | /* The inflate algorithm uses a sliding 32K byte window on the uncompressed | |
144 | stream to find repeated byte strings. This is implemented here as a | |
145 | circular buffer. The index is updated simply by incrementing and then | |
146 | and'ing with 0x7fff (32K-1). */ | |
147 | /* It is left to other modules to supply the 32K area. It is assumed | |
148 | to be usable as if it were declared "uch slide[32768];" or as just | |
149 | "uch *slide;" and then malloc'ed in the latter case. The definition | |
150 | must be in unzip.h, included above. */ | |
151 | /* unsigned wp; current position in slide */ | |
152 | #define wp outcnt | |
153 | #define flush_output(w) (wp=(w),flush_window()) | |
154 | ||
155 | /* Tables for deflate from PKZIP's appnote.txt. */ | |
156 | static unsigned border[] = { /* Order of the bit length code lengths */ | |
157 | 16, 17, 18, 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15}; | |
158 | static ush cplens[] = { /* Copy lengths for literal codes 257..285 */ | |
159 | 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 15, 17, 19, 23, 27, 31, | |
160 | 35, 43, 51, 59, 67, 83, 99, 115, 131, 163, 195, 227, 258, 0, 0}; | |
161 | /* note: see note #13 above about the 258 in this list. */ | |
162 | static ush cplext[] = { /* Extra bits for literal codes 257..285 */ | |
163 | 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, | |
164 | 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 0, 99, 99}; /* 99==invalid */ | |
165 | static ush cpdist[] = { /* Copy offsets for distance codes 0..29 */ | |
166 | 1, 2, 3, 4, 5, 7, 9, 13, 17, 25, 33, 49, 65, 97, 129, 193, | |
167 | 257, 385, 513, 769, 1025, 1537, 2049, 3073, 4097, 6145, | |
168 | 8193, 12289, 16385, 24577}; | |
169 | static ush cpdext[] = { /* Extra bits for distance codes */ | |
170 | 0, 0, 0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, | |
171 | 7, 7, 8, 8, 9, 9, 10, 10, 11, 11, | |
172 | 12, 12, 13, 13}; | |
173 | ||
174 | ||
175 | ||
176 | /* Macros for inflate() bit peeking and grabbing. | |
177 | The usage is: | |
178 | ||
179 | NEEDBITS(j) | |
180 | x = b & mask_bits[j]; | |
181 | DUMPBITS(j) | |
182 | ||
183 | where NEEDBITS makes sure that b has at least j bits in it, and | |
184 | DUMPBITS removes the bits from b. The macros use the variable k | |
185 | for the number of bits in b. Normally, b and k are register | |
186 | variables for speed, and are initialized at the beginning of a | |
187 | routine that uses these macros from a global bit buffer and count. | |
188 | ||
189 | If we assume that EOB will be the longest code, then we will never | |
190 | ask for bits with NEEDBITS that are beyond the end of the stream. | |
191 | So, NEEDBITS should not read any more bytes than are needed to | |
192 | meet the request. Then no bytes need to be "returned" to the buffer | |
193 | at the end of the last block. | |
194 | ||
195 | However, this assumption is not true for fixed blocks--the EOB code | |
196 | is 7 bits, but the other literal/length codes can be 8 or 9 bits. | |
197 | (The EOB code is shorter than other codes because fixed blocks are | |
198 | generally short. So, while a block always has an EOB, many other | |
199 | literal/length codes have a significantly lower probability of | |
200 | showing up at all.) However, by making the first table have a | |
201 | lookup of seven bits, the EOB code will be found in that first | |
202 | lookup, and so will not require that too many bits be pulled from | |
203 | the stream. | |
204 | */ | |
205 | ||
206 | ulg bb; /* bit buffer */ | |
207 | unsigned bk; /* bits in bit buffer */ | |
208 | ||
209 | ush mask_bits[] = { | |
210 | 0x0000, | |
211 | 0x0001, 0x0003, 0x0007, 0x000f, 0x001f, 0x003f, 0x007f, 0x00ff, | |
212 | 0x01ff, 0x03ff, 0x07ff, 0x0fff, 0x1fff, 0x3fff, 0x7fff, 0xffff | |
213 | }; | |
214 | ||
215 | #ifdef CRYPT | |
216 | uch cc; | |
217 | # define NEXTBYTE() \ | |
218 | (decrypt ? (cc = get_byte(), zdecode(cc), cc) : get_byte()) | |
219 | #else | |
220 | # define NEXTBYTE() (uch)get_byte() | |
221 | #endif | |
222 | #define NEEDBITS(n) {while(k<(n)){b|=((ulg)NEXTBYTE())<<k;k+=8;}} | |
223 | #define DUMPBITS(n) {b>>=(n);k-=(n);} | |
224 | ||
225 | ||
226 | /* | |
227 | Huffman code decoding is performed using a multi-level table lookup. | |
228 | The fastest way to decode is to simply build a lookup table whose | |
229 | size is determined by the longest code. However, the time it takes | |
230 | to build this table can also be a factor if the data being decoded | |
231 | is not very long. The most common codes are necessarily the | |
232 | shortest codes, so those codes dominate the decoding time, and hence | |
233 | the speed. The idea is you can have a shorter table that decodes the | |
234 | shorter, more probable codes, and then point to subsidiary tables for | |
235 | the longer codes. The time it costs to decode the longer codes is | |
236 | then traded against the time it takes to make longer tables. | |
237 | ||
238 | This results of this trade are in the variables lbits and dbits | |
239 | below. lbits is the number of bits the first level table for literal/ | |
240 | length codes can decode in one step, and dbits is the same thing for | |
241 | the distance codes. Subsequent tables are also less than or equal to | |
242 | those sizes. These values may be adjusted either when all of the | |
243 | codes are shorter than that, in which case the longest code length in | |
244 | bits is used, or when the shortest code is *longer* than the requested | |
245 | table size, in which case the length of the shortest code in bits is | |
246 | used. | |
247 | ||
248 | There are two different values for the two tables, since they code a | |
249 | different number of possibilities each. The literal/length table | |
250 | codes 286 possible values, or in a flat code, a little over eight | |
251 | bits. The distance table codes 30 possible values, or a little less | |
252 | than five bits, flat. The optimum values for speed end up being | |
253 | about one bit more than those, so lbits is 8+1 and dbits is 5+1. | |
254 | The optimum values may differ though from machine to machine, and | |
255 | possibly even between compilers. Your mileage may vary. | |
256 | */ | |
257 | ||
258 | ||
259 | int lbits = 9; /* bits in base literal/length lookup table */ | |
260 | int dbits = 6; /* bits in base distance lookup table */ | |
261 | ||
262 | ||
263 | /* If BMAX needs to be larger than 16, then h and x[] should be ulg. */ | |
264 | #define BMAX 16 /* maximum bit length of any code (16 for explode) */ | |
265 | #define N_MAX 288 /* maximum number of codes in any set */ | |
266 | ||
267 | ||
268 | unsigned hufts; /* track memory usage */ | |
269 | ||
270 | ||
271 | int huft_build(b, n, s, d, e, t, m) | |
272 | unsigned *b; /* code lengths in bits (all assumed <= BMAX) */ | |
273 | unsigned n; /* number of codes (assumed <= N_MAX) */ | |
274 | unsigned s; /* number of simple-valued codes (0..s-1) */ | |
275 | ush *d; /* list of base values for non-simple codes */ | |
276 | ush *e; /* list of extra bits for non-simple codes */ | |
277 | struct huft **t; /* result: starting table */ | |
278 | int *m; /* maximum lookup bits, returns actual */ | |
279 | /* Given a list of code lengths and a maximum table size, make a set of | |
280 | tables to decode that set of codes. Return zero on success, one if | |
281 | the given code set is incomplete (the tables are still built in this | |
282 | case), two if the input is invalid (all zero length codes or an | |
283 | oversubscribed set of lengths), and three if not enough memory. */ | |
284 | { | |
285 | unsigned a; /* counter for codes of length k */ | |
286 | unsigned c[BMAX+1]; /* bit length count table */ | |
287 | unsigned f; /* i repeats in table every f entries */ | |
288 | int g; /* maximum code length */ | |
289 | int h; /* table level */ | |
290 | register unsigned i; /* counter, current code */ | |
291 | register unsigned j; /* counter */ | |
292 | register int k; /* number of bits in current code */ | |
293 | int l; /* bits per table (returned in m) */ | |
294 | register unsigned *p; /* pointer into c[], b[], or v[] */ | |
295 | register struct huft *q; /* points to current table */ | |
296 | struct huft r; /* table entry for structure assignment */ | |
297 | struct huft *u[BMAX]; /* table stack */ | |
298 | unsigned v[N_MAX]; /* values in order of bit length */ | |
299 | register int w; /* bits before this table == (l * h) */ | |
300 | unsigned x[BMAX+1]; /* bit offsets, then code stack */ | |
301 | unsigned *xp; /* pointer into x */ | |
302 | int y; /* number of dummy codes added */ | |
303 | unsigned z; /* number of entries in current table */ | |
304 | ||
305 | ||
306 | /* Generate counts for each bit length */ | |
307 | memzero(c, sizeof(c)); | |
308 | p = b; i = n; | |
309 | do { | |
310 | Tracecv(*p, (stderr, (n-i >= ' ' && n-i <= '~' ? "%c %d\n" : "0x%x %d\n"), | |
311 | n-i, *p)); | |
312 | c[*p++]++; /* assume all entries <= BMAX */ | |
313 | } while (--i); | |
314 | if (c[0] == n) /* null input--all zero length codes */ | |
315 | { | |
316 | *t = (struct huft *)NULL; | |
317 | *m = 0; | |
318 | return 0; | |
319 | } | |
320 | ||
321 | ||
322 | /* Find minimum and maximum length, bound *m by those */ | |
323 | l = *m; | |
324 | for (j = 1; j <= BMAX; j++) | |
325 | if (c[j]) | |
326 | break; | |
327 | k = j; /* minimum code length */ | |
328 | if ((unsigned)l < j) | |
329 | l = j; | |
330 | for (i = BMAX; i; i--) | |
331 | if (c[i]) | |
332 | break; | |
333 | g = i; /* maximum code length */ | |
334 | if ((unsigned)l > i) | |
335 | l = i; | |
336 | *m = l; | |
337 | ||
338 | ||
339 | /* Adjust last length count to fill out codes, if needed */ | |
340 | for (y = 1 << j; j < i; j++, y <<= 1) | |
341 | if ((y -= c[j]) < 0) | |
342 | return 2; /* bad input: more codes than bits */ | |
343 | if ((y -= c[i]) < 0) | |
344 | return 2; | |
345 | c[i] += y; | |
346 | ||
347 | ||
348 | /* Generate starting offsets into the value table for each length */ | |
349 | x[1] = j = 0; | |
350 | p = c + 1; xp = x + 2; | |
351 | while (--i) { /* note that i == g from above */ | |
352 | *xp++ = (j += *p++); | |
353 | } | |
354 | ||
355 | ||
356 | /* Make a table of values in order of bit lengths */ | |
357 | p = b; i = 0; | |
358 | do { | |
359 | if ((j = *p++) != 0) | |
360 | v[x[j]++] = i; | |
361 | } while (++i < n); | |
362 | ||
363 | ||
364 | /* Generate the Huffman codes and for each, make the table entries */ | |
365 | x[0] = i = 0; /* first Huffman code is zero */ | |
366 | p = v; /* grab values in bit order */ | |
367 | h = -1; /* no tables yet--level -1 */ | |
368 | w = -l; /* bits decoded == (l * h) */ | |
369 | u[0] = (struct huft *)NULL; /* just to keep compilers happy */ | |
370 | q = (struct huft *)NULL; /* ditto */ | |
371 | z = 0; /* ditto */ | |
372 | ||
373 | /* go through the bit lengths (k already is bits in shortest code) */ | |
374 | for (; k <= g; k++) | |
375 | { | |
376 | a = c[k]; | |
377 | while (a--) | |
378 | { | |
379 | /* here i is the Huffman code of length k bits for value *p */ | |
380 | /* make tables up to required level */ | |
381 | while (k > w + l) | |
382 | { | |
383 | h++; | |
384 | w += l; /* previous table always l bits */ | |
385 | ||
386 | /* compute minimum size table less than or equal to l bits */ | |
387 | z = (z = g - w) > (unsigned)l ? l : z; /* upper limit on table size */ | |
388 | if ((f = 1 << (j = k - w)) > a + 1) /* try a k-w bit table */ | |
389 | { /* too few codes for k-w bit table */ | |
390 | f -= a + 1; /* deduct codes from patterns left */ | |
391 | xp = c + k; | |
392 | while (++j < z) /* try smaller tables up to z bits */ | |
393 | { | |
394 | if ((f <<= 1) <= *++xp) | |
395 | break; /* enough codes to use up j bits */ | |
396 | f -= *xp; /* else deduct codes from patterns */ | |
397 | } | |
398 | } | |
399 | z = 1 << j; /* table entries for j-bit table */ | |
400 | ||
401 | /* allocate and link in new table */ | |
402 | if ((q = (struct huft *)malloc((z + 1)*sizeof(struct huft))) == | |
403 | (struct huft *)NULL) | |
404 | { | |
405 | if (h) | |
406 | huft_free(u[0]); | |
407 | return 3; /* not enough memory */ | |
408 | } | |
409 | hufts += z + 1; /* track memory usage */ | |
410 | *t = q + 1; /* link to list for huft_free() */ | |
411 | *(t = &(q->v.t)) = (struct huft *)NULL; | |
412 | u[h] = ++q; /* table starts after link */ | |
413 | ||
414 | /* connect to last table, if there is one */ | |
415 | if (h) | |
416 | { | |
417 | x[h] = i; /* save pattern for backing up */ | |
418 | r.b = (uch)l; /* bits to dump before this table */ | |
419 | r.e = (uch)(16 + j); /* bits in this table */ | |
420 | r.v.t = q; /* pointer to this table */ | |
421 | j = i >> (w - l); /* (get around Turbo C bug) */ | |
422 | u[h-1][j] = r; /* connect to last table */ | |
423 | } | |
424 | } | |
425 | ||
426 | /* set up table entry in r */ | |
427 | r.b = (uch)(k - w); | |
428 | if (p >= v + n) | |
429 | r.e = 99; /* out of values--invalid code */ | |
430 | else if (*p < s) | |
431 | { | |
432 | r.e = (uch)(*p < 256 ? 16 : 15); /* 256 is end-of-block code */ | |
433 | r.v.n = (ush)(*p); /* simple code is just the value */ | |
434 | p++; /* one compiler does not like *p++ */ | |
435 | } | |
436 | else | |
437 | { | |
438 | r.e = (uch)e[*p - s]; /* non-simple--look up in lists */ | |
439 | r.v.n = d[*p++ - s]; | |
440 | } | |
441 | ||
442 | /* fill code-like entries with r */ | |
443 | f = 1 << (k - w); | |
444 | for (j = i >> w; j < z; j += f) | |
445 | q[j] = r; | |
446 | ||
447 | /* backwards increment the k-bit code i */ | |
448 | for (j = 1 << (k - 1); i & j; j >>= 1) | |
449 | i ^= j; | |
450 | i ^= j; | |
451 | ||
452 | /* backup over finished tables */ | |
453 | while ((i & ((1 << w) - 1)) != x[h]) | |
454 | { | |
455 | h--; /* don't need to update q */ | |
456 | w -= l; | |
457 | } | |
458 | } | |
459 | } | |
460 | ||
461 | ||
462 | /* Return true (1) if we were given an incomplete table */ | |
463 | return y != 0 && g != 1; | |
464 | } | |
465 | ||
466 | ||
467 | ||
468 | int huft_free(t) | |
469 | struct huft *t; /* table to free */ | |
470 | /* Free the malloc'ed tables built by huft_build(), which makes a linked | |
471 | list of the tables it made, with the links in a dummy first entry of | |
472 | each table. */ | |
473 | { | |
474 | register struct huft *p, *q; | |
475 | ||
476 | ||
477 | /* Go through linked list, freeing from the malloced (t[-1]) address. */ | |
478 | p = t; | |
479 | while (p != (struct huft *)NULL) | |
480 | { | |
481 | q = (--p)->v.t; | |
482 | free((char*)p); | |
483 | p = q; | |
484 | } | |
485 | return 0; | |
486 | } | |
487 | ||
488 | ||
489 | int inflate_codes(tl, td, bl, bd) | |
490 | struct huft *tl, *td; /* literal/length and distance decoder tables */ | |
491 | int bl, bd; /* number of bits decoded by tl[] and td[] */ | |
492 | /* inflate (decompress) the codes in a deflated (compressed) block. | |
493 | Return an error code or zero if it all goes ok. */ | |
494 | { | |
495 | register unsigned e; /* table entry flag/number of extra bits */ | |
496 | unsigned n, d; /* length and index for copy */ | |
497 | unsigned w; /* current window position */ | |
498 | struct huft *t; /* pointer to table entry */ | |
499 | unsigned ml, md; /* masks for bl and bd bits */ | |
500 | register ulg b; /* bit buffer */ | |
501 | register unsigned k; /* number of bits in bit buffer */ | |
502 | ||
503 | ||
504 | /* make local copies of globals */ | |
505 | b = bb; /* initialize bit buffer */ | |
506 | k = bk; | |
507 | w = wp; /* initialize window position */ | |
508 | ||
509 | /* inflate the coded data */ | |
510 | ml = mask_bits[bl]; /* precompute masks for speed */ | |
511 | md = mask_bits[bd]; | |
512 | for (;;) /* do until end of block */ | |
513 | { | |
514 | NEEDBITS((unsigned)bl) | |
515 | if ((e = (t = tl + ((unsigned)b & ml))->e) > 16) | |
516 | do { | |
517 | if (e == 99) | |
518 | return 1; | |
519 | DUMPBITS(t->b) | |
520 | e -= 16; | |
521 | NEEDBITS(e) | |
522 | } while ((e = (t = t->v.t + ((unsigned)b & mask_bits[e]))->e) > 16); | |
523 | DUMPBITS(t->b) | |
524 | if (e == 16) /* then it's a literal */ | |
525 | { | |
526 | slide[w++] = (uch)t->v.n; | |
527 | Tracevv((stderr, "%c", slide[w-1])); | |
528 | if (w == WSIZE) | |
529 | { | |
530 | flush_output(w); | |
531 | w = 0; | |
532 | } | |
533 | } | |
534 | else /* it's an EOB or a length */ | |
535 | { | |
536 | /* exit if end of block */ | |
537 | if (e == 15) | |
538 | break; | |
539 | ||
540 | /* get length of block to copy */ | |
541 | NEEDBITS(e) | |
542 | n = t->v.n + ((unsigned)b & mask_bits[e]); | |
543 | DUMPBITS(e); | |
544 | ||
545 | /* decode distance of block to copy */ | |
546 | NEEDBITS((unsigned)bd) | |
547 | if ((e = (t = td + ((unsigned)b & md))->e) > 16) | |
548 | do { | |
549 | if (e == 99) | |
550 | return 1; | |
551 | DUMPBITS(t->b) | |
552 | e -= 16; | |
553 | NEEDBITS(e) | |
554 | } while ((e = (t = t->v.t + ((unsigned)b & mask_bits[e]))->e) > 16); | |
555 | DUMPBITS(t->b) | |
556 | NEEDBITS(e) | |
557 | d = w - t->v.n - ((unsigned)b & mask_bits[e]); | |
558 | DUMPBITS(e) | |
559 | Tracevv((stderr,"\\[%d,%d]", w-d, n)); | |
560 | ||
561 | /* do the copy */ | |
562 | do { | |
563 | n -= (e = (e = WSIZE - ((d &= WSIZE-1) > w ? d : w)) > n ? n : e); | |
564 | #if !defined(NOMEMCPY) && !defined(DEBUG) | |
565 | if (w - d >= e) /* (this test assumes unsigned comparison) */ | |
566 | { | |
567 | memcpy(slide + w, slide + d, e); | |
568 | w += e; | |
569 | d += e; | |
570 | } | |
571 | else /* do it slow to avoid memcpy() overlap */ | |
572 | #endif /* !NOMEMCPY */ | |
573 | do { | |
574 | slide[w++] = slide[d++]; | |
575 | Tracevv((stderr, "%c", slide[w-1])); | |
576 | } while (--e); | |
577 | if (w == WSIZE) | |
578 | { | |
579 | flush_output(w); | |
580 | w = 0; | |
581 | } | |
582 | } while (n); | |
583 | } | |
584 | } | |
585 | ||
586 | ||
587 | /* restore the globals from the locals */ | |
588 | wp = w; /* restore global window pointer */ | |
589 | bb = b; /* restore global bit buffer */ | |
590 | bk = k; | |
591 | ||
592 | /* done */ | |
593 | return 0; | |
594 | } | |
595 | ||
596 | ||
597 | ||
598 | int inflate_stored() | |
599 | /* "decompress" an inflated type 0 (stored) block. */ | |
600 | { | |
601 | unsigned n; /* number of bytes in block */ | |
602 | unsigned w; /* current window position */ | |
603 | register ulg b; /* bit buffer */ | |
604 | register unsigned k; /* number of bits in bit buffer */ | |
605 | ||
606 | ||
607 | /* make local copies of globals */ | |
608 | b = bb; /* initialize bit buffer */ | |
609 | k = bk; | |
610 | w = wp; /* initialize window position */ | |
611 | ||
612 | ||
613 | /* go to byte boundary */ | |
614 | n = k & 7; | |
615 | DUMPBITS(n); | |
616 | ||
617 | ||
618 | /* get the length and its complement */ | |
619 | NEEDBITS(16) | |
620 | n = ((unsigned)b & 0xffff); | |
621 | DUMPBITS(16) | |
622 | NEEDBITS(16) | |
623 | if (n != (unsigned)((~b) & 0xffff)) | |
624 | return 1; /* error in compressed data */ | |
625 | DUMPBITS(16) | |
626 | ||
627 | ||
628 | /* read and output the compressed data */ | |
629 | while (n--) | |
630 | { | |
631 | NEEDBITS(8) | |
632 | slide[w++] = (uch)b; | |
633 | if (w == WSIZE) | |
634 | { | |
635 | flush_output(w); | |
636 | w = 0; | |
637 | } | |
638 | DUMPBITS(8) | |
639 | } | |
640 | ||
641 | ||
642 | /* restore the globals from the locals */ | |
643 | wp = w; /* restore global window pointer */ | |
644 | bb = b; /* restore global bit buffer */ | |
645 | bk = k; | |
646 | return 0; | |
647 | } | |
648 | ||
649 | ||
650 | ||
651 | int inflate_fixed() | |
652 | /* decompress an inflated type 1 (fixed Huffman codes) block. We should | |
653 | either replace this with a custom decoder, or at least precompute the | |
654 | Huffman tables. */ | |
655 | { | |
656 | int i; /* temporary variable */ | |
657 | struct huft *tl; /* literal/length code table */ | |
658 | struct huft *td; /* distance code table */ | |
659 | int bl; /* lookup bits for tl */ | |
660 | int bd; /* lookup bits for td */ | |
661 | unsigned l[288]; /* length list for huft_build */ | |
662 | ||
663 | ||
664 | /* set up literal table */ | |
665 | for (i = 0; i < 144; i++) | |
666 | l[i] = 8; | |
667 | for (; i < 256; i++) | |
668 | l[i] = 9; | |
669 | for (; i < 280; i++) | |
670 | l[i] = 7; | |
671 | for (; i < 288; i++) /* make a complete, but wrong code set */ | |
672 | l[i] = 8; | |
673 | bl = 7; | |
674 | if ((i = huft_build(l, 288, 257, cplens, cplext, &tl, &bl)) != 0) | |
675 | return i; | |
676 | ||
677 | ||
678 | /* set up distance table */ | |
679 | for (i = 0; i < 30; i++) /* make an incomplete code set */ | |
680 | l[i] = 5; | |
681 | bd = 5; | |
682 | if ((i = huft_build(l, 30, 0, cpdist, cpdext, &td, &bd)) > 1) | |
683 | { | |
684 | huft_free(tl); | |
685 | return i; | |
686 | } | |
687 | ||
688 | ||
689 | /* decompress until an end-of-block code */ | |
690 | if (inflate_codes(tl, td, bl, bd)) | |
691 | return 1; | |
692 | ||
693 | ||
694 | /* free the decoding tables, return */ | |
695 | huft_free(tl); | |
696 | huft_free(td); | |
697 | return 0; | |
698 | } | |
699 | ||
700 | ||
701 | ||
702 | int inflate_dynamic() | |
703 | /* decompress an inflated type 2 (dynamic Huffman codes) block. */ | |
704 | { | |
705 | int i; /* temporary variables */ | |
706 | unsigned j; | |
707 | unsigned l; /* last length */ | |
708 | unsigned m; /* mask for bit lengths table */ | |
709 | unsigned n; /* number of lengths to get */ | |
710 | struct huft *tl; /* literal/length code table */ | |
711 | struct huft *td; /* distance code table */ | |
712 | int bl; /* lookup bits for tl */ | |
713 | int bd; /* lookup bits for td */ | |
714 | unsigned nb; /* number of bit length codes */ | |
715 | unsigned nl; /* number of literal/length codes */ | |
716 | unsigned nd; /* number of distance codes */ | |
717 | #ifdef PKZIP_BUG_WORKAROUND | |
718 | unsigned ll[288+32]; /* literal/length and distance code lengths */ | |
719 | #else | |
720 | unsigned ll[286+30]; /* literal/length and distance code lengths */ | |
721 | #endif | |
722 | register ulg b; /* bit buffer */ | |
723 | register unsigned k; /* number of bits in bit buffer */ | |
724 | ||
725 | ||
726 | /* make local bit buffer */ | |
727 | b = bb; | |
728 | k = bk; | |
729 | ||
730 | ||
731 | /* read in table lengths */ | |
732 | NEEDBITS(5) | |
733 | nl = 257 + ((unsigned)b & 0x1f); /* number of literal/length codes */ | |
734 | DUMPBITS(5) | |
735 | NEEDBITS(5) | |
736 | nd = 1 + ((unsigned)b & 0x1f); /* number of distance codes */ | |
737 | DUMPBITS(5) | |
738 | NEEDBITS(4) | |
739 | nb = 4 + ((unsigned)b & 0xf); /* number of bit length codes */ | |
740 | DUMPBITS(4) | |
741 | #ifdef PKZIP_BUG_WORKAROUND | |
742 | if (nl > 288 || nd > 32) | |
743 | #else | |
744 | if (nl > 286 || nd > 30) | |
745 | #endif | |
746 | return 1; /* bad lengths */ | |
747 | ||
748 | ||
749 | /* read in bit-length-code lengths */ | |
750 | for (j = 0; j < nb; j++) | |
751 | { | |
752 | NEEDBITS(3) | |
753 | ll[border[j]] = (unsigned)b & 7; | |
754 | DUMPBITS(3) | |
755 | } | |
756 | for (; j < 19; j++) | |
757 | ll[border[j]] = 0; | |
758 | ||
759 | ||
760 | /* build decoding table for trees--single level, 7 bit lookup */ | |
761 | bl = 7; | |
762 | if ((i = huft_build(ll, 19, 19, NULL, NULL, &tl, &bl)) != 0) | |
763 | { | |
764 | if (i == 1) | |
765 | huft_free(tl); | |
766 | return i; /* incomplete code set */ | |
767 | } | |
768 | ||
769 | ||
770 | /* read in literal and distance code lengths */ | |
771 | n = nl + nd; | |
772 | m = mask_bits[bl]; | |
773 | i = l = 0; | |
774 | while ((unsigned)i < n) | |
775 | { | |
776 | NEEDBITS((unsigned)bl) | |
777 | j = (td = tl + ((unsigned)b & m))->b; | |
778 | DUMPBITS(j) | |
779 | j = td->v.n; | |
780 | if (j < 16) /* length of code in bits (0..15) */ | |
781 | ll[i++] = l = j; /* save last length in l */ | |
782 | else if (j == 16) /* repeat last length 3 to 6 times */ | |
783 | { | |
784 | NEEDBITS(2) | |
785 | j = 3 + ((unsigned)b & 3); | |
786 | DUMPBITS(2) | |
787 | if ((unsigned)i + j > n) | |
788 | return 1; | |
789 | while (j--) | |
790 | ll[i++] = l; | |
791 | } | |
792 | else if (j == 17) /* 3 to 10 zero length codes */ | |
793 | { | |
794 | NEEDBITS(3) | |
795 | j = 3 + ((unsigned)b & 7); | |
796 | DUMPBITS(3) | |
797 | if ((unsigned)i + j > n) | |
798 | return 1; | |
799 | while (j--) | |
800 | ll[i++] = 0; | |
801 | l = 0; | |
802 | } | |
803 | else /* j == 18: 11 to 138 zero length codes */ | |
804 | { | |
805 | NEEDBITS(7) | |
806 | j = 11 + ((unsigned)b & 0x7f); | |
807 | DUMPBITS(7) | |
808 | if ((unsigned)i + j > n) | |
809 | return 1; | |
810 | while (j--) | |
811 | ll[i++] = 0; | |
812 | l = 0; | |
813 | } | |
814 | } | |
815 | ||
816 | ||
817 | /* free decoding table for trees */ | |
818 | huft_free(tl); | |
819 | ||
820 | ||
821 | /* restore the global bit buffer */ | |
822 | bb = b; | |
823 | bk = k; | |
824 | ||
825 | ||
826 | /* build the decoding tables for literal/length and distance codes */ | |
827 | bl = lbits; | |
828 | if ((i = huft_build(ll, nl, 257, cplens, cplext, &tl, &bl)) != 0) | |
829 | { | |
830 | if (i == 1) { | |
831 | fprintf(stderr, " incomplete literal tree\n"); | |
832 | huft_free(tl); | |
833 | } | |
834 | return i; /* incomplete code set */ | |
835 | } | |
836 | bd = dbits; | |
837 | if ((i = huft_build(ll + nl, nd, 0, cpdist, cpdext, &td, &bd)) != 0) | |
838 | { | |
839 | if (i == 1) { | |
840 | fprintf(stderr, " incomplete distance tree\n"); | |
841 | #ifdef PKZIP_BUG_WORKAROUND | |
842 | i = 0; | |
843 | } | |
844 | #else | |
845 | huft_free(td); | |
846 | } | |
847 | huft_free(tl); | |
848 | return i; /* incomplete code set */ | |
849 | #endif | |
850 | } | |
851 | ||
852 | ||
853 | /* decompress until an end-of-block code */ | |
854 | if (inflate_codes(tl, td, bl, bd)) | |
855 | return 1; | |
856 | ||
857 | ||
858 | /* free the decoding tables, return */ | |
859 | huft_free(tl); | |
860 | huft_free(td); | |
861 | return 0; | |
862 | } | |
863 | ||
864 | ||
865 | ||
866 | int inflate_block(e) | |
867 | int *e; /* last block flag */ | |
868 | /* decompress an inflated block */ | |
869 | { | |
870 | unsigned t; /* block type */ | |
871 | register ulg b; /* bit buffer */ | |
872 | register unsigned k; /* number of bits in bit buffer */ | |
873 | ||
874 | ||
875 | /* make local bit buffer */ | |
876 | b = bb; | |
877 | k = bk; | |
878 | ||
879 | ||
880 | /* read in last block bit */ | |
881 | NEEDBITS(1) | |
882 | *e = (int)b & 1; | |
883 | DUMPBITS(1) | |
884 | ||
885 | ||
886 | /* read in block type */ | |
887 | NEEDBITS(2) | |
888 | t = (unsigned)b & 3; | |
889 | DUMPBITS(2) | |
890 | ||
891 | ||
892 | /* restore the global bit buffer */ | |
893 | bb = b; | |
894 | bk = k; | |
895 | ||
896 | ||
897 | /* inflate that block type */ | |
898 | if (t == 2) | |
899 | return inflate_dynamic(); | |
900 | if (t == 0) | |
901 | return inflate_stored(); | |
902 | if (t == 1) | |
903 | return inflate_fixed(); | |
904 | ||
905 | ||
906 | /* bad block type */ | |
907 | return 2; | |
908 | } | |
909 | ||
910 | ||
911 | ||
912 | int inflate() | |
913 | /* decompress an inflated entry */ | |
914 | { | |
915 | int e; /* last block flag */ | |
916 | int r; /* result code */ | |
917 | unsigned h; /* maximum struct huft's malloc'ed */ | |
918 | ||
919 | ||
920 | /* initialize window, bit buffer */ | |
921 | wp = 0; | |
922 | bk = 0; | |
923 | bb = 0; | |
924 | ||
925 | ||
926 | /* decompress until the last block */ | |
927 | h = 0; | |
928 | do { | |
929 | hufts = 0; | |
930 | if ((r = inflate_block(&e)) != 0) | |
931 | return r; | |
932 | if (hufts > h) | |
933 | h = hufts; | |
934 | } while (!e); | |
935 | ||
936 | /* Undo too much lookahead. The next read will be byte aligned so we | |
937 | * can discard unused bits in the last meaningful byte. | |
938 | */ | |
939 | while (bk >= 8) { | |
940 | bk -= 8; | |
941 | inptr--; | |
942 | } | |
943 | ||
944 | /* flush out slide */ | |
945 | flush_output(wp); | |
946 | ||
947 | ||
948 | /* return success */ | |
949 | #ifdef DEBUG | |
950 | fprintf(stderr, "<%u> ", h); | |
951 | #endif /* DEBUG */ | |
952 | return 0; | |
953 | } |