[obfuscated-c] / brainstorming /
drwxr-xr-x   ..
-rw-r--r-- 2289 README.md

Overview

A few thoughts for use in future obfuscated programs.

Digraphs, Trigraphs and Syntax Highlighting

At this point, trigraphs betray their presence by requiring compiler flags, making any direct benefit for obfuscation dubious.

However, trigraphs cause many syntax highlighting packages to incorrectly highlight the source code. For example, the following code snippet frequently displays the exit(0); line as code rather than comment when processed by syntax highlighting programs which miss the trigraph ??/ converting to \, thereby escaping the newline and creating a two line comment.

// Should I exit early?????/
exit(0);

As long as syntax highlighting is kept sane elsewhere in an obfuscated program, the user may gradually come to trust it, perhaps allowing an instance or two of trigraph-induced syntax highlighting failure to slip past the reader.

Of course, readers may run the equivalent of a search and replace, condensing trigraphs to their single character equivalent. Since the CPP does an equivalent search and replace before performing any other processing, this is safe. On the other hand, digraphs are dealt with during the tokenization process, meaning that a simple search-and-replace by the user is not necessarily a safe transformation of the source code. Is it possible to include two important digraphs hidden amongst frivolous usage, such that

One possible example of the ‘false’ digraph would be embedding the characters inside another token, perhaps a multi-part string split across multiple lines? If a naive search-and-replace would convert the string into something syntax-breaking, then the reader may avoid doing a digraph conversion before reading the source, despite knowing such digraphs are there, and thus may be tricked into believing lies from their syntax highlighter.

I suppose that leads to the natural question: Do people typically do a search-and-replace for digraphs when reading obfuscated code, or do they use a more language-aware method?