Added a `brainstorming.md` file with misc ideas for future obfuscation projects.
[obfuscated-c] / brainstorming.md
CommitLineData
435f6fd0
AT
1# Overview #
2
3A few thoughts for use in future obfuscated programs.
4
5
6# Digraphs, Trigraphs and Syntax Highlighting #
7
8At this point, trigraphs betray their presence by requiring compiler flags,
9making any direct benefit for obfuscation dubious.
10
11However, trigraphs cause many syntax highlighting packages to incorrectly
12highlight the source code. For example, the following code snippet frequently
13displays the `exit(0);` line as code rather than comment when processed by
14syntax highlighting programs which miss the trigraph `??/` converting to `\`,
15thereby escaping the newline and creating a two line comment.
16
17 // Should I exit early?????/
18 exit(0);
19
20As long as syntax highlighting is kept sane elsewhere in an obfuscated program,
21the user may gradually come to trust it, perhaps allowing an instance or two of
22trigraph-induced syntax highlighting failure to slip past the reader.
23
24Of course, readers may run the equivalent of a search and replace, condensing
25trigraphs to their single character equivalent. Since the CPP does an
26equivalent search and replace before performing any other processing, this is
27safe. On the other hand, digraphs are dealt with during the tokenization
28process, meaning that a simple search-and-replace by the user is not
29necessarily a safe transformation of the source code. Is it possible to include
30two important digraphs hidden amongst frivolous usage, such that
31
32 - one digraph breaks syntax highlighting in a useful way, like the example
33 demonstrated above, and
34
35 - the other digraph isn't a real digraph, rather being something which breaks
36 the program if digraphs are converted with a simple search-and-replace?
37
38One possible example of the 'false' digraph would be embedding the characters
39inside another token, perhaps a multi-part string split across multiple lines?
40If a naive search-and-replace would convert the string into something
41syntax-breaking, then the reader may avoid doing a digraph conversion before
42reading the source, despite knowing such digraphs are there, and thus may be
43tricked into believing lies from their syntax highlighter.
44
45I suppose that leads to the natural question: Do people typically do a
46search-and-replace for digraphs when reading obfuscated code, or do they use a
47more language-aware method?