| 1 | # Overview # |
| 2 | |
| 3 | A few thoughts for use in future obfuscated programs. |
| 4 | |
| 5 | |
| 6 | # Digraphs, Trigraphs and Syntax Highlighting # |
| 7 | |
| 8 | At this point, trigraphs betray their presence by requiring compiler flags, |
| 9 | making any direct benefit for obfuscation dubious. |
| 10 | |
| 11 | However, trigraphs cause many syntax highlighting packages to incorrectly |
| 12 | highlight the source code. For example, the following code snippet frequently |
| 13 | displays the `exit(0);` line as code rather than comment when processed by |
| 14 | syntax highlighting programs which miss the trigraph `??/` converting to `\`, |
| 15 | thereby escaping the newline and creating a two line comment. |
| 16 | |
| 17 | // Should I exit early?????/ |
| 18 | exit(0); |
| 19 | |
| 20 | As long as syntax highlighting is kept sane elsewhere in an obfuscated program, |
| 21 | the user may gradually come to trust it, perhaps allowing an instance or two of |
| 22 | trigraph-induced syntax highlighting failure to slip past the reader. |
| 23 | |
| 24 | Of course, readers may run the equivalent of a search and replace, condensing |
| 25 | trigraphs to their single character equivalent. Since the CPP does an |
| 26 | equivalent search and replace before performing any other processing, this is |
| 27 | safe. On the other hand, digraphs are dealt with during the tokenization |
| 28 | process, meaning that a simple search-and-replace by the user is not |
| 29 | necessarily a safe transformation of the source code. Is it possible to include |
| 30 | two important digraphs hidden amongst frivolous usage, such that |
| 31 | |
| 32 | - one digraph breaks syntax highlighting in a useful way, like the example |
| 33 | demonstrated above, and |
| 34 | |
| 35 | - the other digraph isn't a real digraph, rather being something which breaks |
| 36 | the program if digraphs are converted with a simple search-and-replace? |
| 37 | |
| 38 | One possible example of the 'false' digraph would be embedding the characters |
| 39 | inside another token, perhaps a multi-part string split across multiple lines? |
| 40 | If a naive search-and-replace would convert the string into something |
| 41 | syntax-breaking, then the reader may avoid doing a digraph conversion before |
| 42 | reading the source, despite knowing such digraphs are there, and thus may be |
| 43 | tricked into believing lies from their syntax highlighter. |
| 44 | |
| 45 | I suppose that leads to the natural question: Do people typically do a |
| 46 | search-and-replace for digraphs when reading obfuscated code, or do they use a |
| 47 | more language-aware method? |