Commit | Line | Data |
---|---|---|
435f6fd0 AT |
1 | # Overview # |
2 | ||
3 | A few thoughts for use in future obfuscated programs. | |
4 | ||
5 | ||
6 | # Digraphs, Trigraphs and Syntax Highlighting # | |
7 | ||
8 | At this point, trigraphs betray their presence by requiring compiler flags, | |
9 | making any direct benefit for obfuscation dubious. | |
10 | ||
11 | However, trigraphs cause many syntax highlighting packages to incorrectly | |
12 | highlight the source code. For example, the following code snippet frequently | |
13 | displays the `exit(0);` line as code rather than comment when processed by | |
14 | syntax highlighting programs which miss the trigraph `??/` converting to `\`, | |
15 | thereby escaping the newline and creating a two line comment. | |
16 | ||
17 | // Should I exit early?????/ | |
18 | exit(0); | |
19 | ||
20 | As long as syntax highlighting is kept sane elsewhere in an obfuscated program, | |
21 | the user may gradually come to trust it, perhaps allowing an instance or two of | |
22 | trigraph-induced syntax highlighting failure to slip past the reader. | |
23 | ||
24 | Of course, readers may run the equivalent of a search and replace, condensing | |
25 | trigraphs to their single character equivalent. Since the CPP does an | |
26 | equivalent search and replace before performing any other processing, this is | |
27 | safe. On the other hand, digraphs are dealt with during the tokenization | |
28 | process, meaning that a simple search-and-replace by the user is not | |
29 | necessarily a safe transformation of the source code. Is it possible to include | |
30 | two important digraphs hidden amongst frivolous usage, such that | |
31 | ||
32 | - one digraph breaks syntax highlighting in a useful way, like the example | |
33 | demonstrated above, and | |
34 | ||
35 | - the other digraph isn't a real digraph, rather being something which breaks | |
36 | the program if digraphs are converted with a simple search-and-replace? | |
37 | ||
38 | One possible example of the 'false' digraph would be embedding the characters | |
39 | inside another token, perhaps a multi-part string split across multiple lines? | |
40 | If a naive search-and-replace would convert the string into something | |
41 | syntax-breaking, then the reader may avoid doing a digraph conversion before | |
42 | reading the source, despite knowing such digraphs are there, and thus may be | |
43 | tricked into believing lies from their syntax highlighter. | |
44 | ||
45 | I suppose that leads to the natural question: Do people typically do a | |
46 | search-and-replace for digraphs when reading obfuscated code, or do they use a | |
47 | more language-aware method? |