From 435f6fd05819ba1cb8af9c329fcb1c02dbabe25c Mon Sep 17 00:00:00 2001 From: Aaron Taylor Date: Mon, 3 May 2021 14:16:27 -0700 Subject: [PATCH] Added a `brainstorming.md` file with misc ideas for future obfuscation projects. --- brainstorming.md | 47 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+) create mode 100644 brainstorming.md diff --git a/brainstorming.md b/brainstorming.md new file mode 100644 index 0000000..66d9f33 --- /dev/null +++ b/brainstorming.md @@ -0,0 +1,47 @@ +# Overview # + +A few thoughts for use in future obfuscated programs. + + +# Digraphs, Trigraphs and Syntax Highlighting # + +At this point, trigraphs betray their presence by requiring compiler flags, +making any direct benefit for obfuscation dubious. + +However, trigraphs cause many syntax highlighting packages to incorrectly +highlight the source code. For example, the following code snippet frequently +displays the `exit(0);` line as code rather than comment when processed by +syntax highlighting programs which miss the trigraph `??/` converting to `\`, +thereby escaping the newline and creating a two line comment. + + // Should I exit early?????/ + exit(0); + +As long as syntax highlighting is kept sane elsewhere in an obfuscated program, +the user may gradually come to trust it, perhaps allowing an instance or two of +trigraph-induced syntax highlighting failure to slip past the reader. + +Of course, readers may run the equivalent of a search and replace, condensing +trigraphs to their single character equivalent. Since the CPP does an +equivalent search and replace before performing any other processing, this is +safe. On the other hand, digraphs are dealt with during the tokenization +process, meaning that a simple search-and-replace by the user is not +necessarily a safe transformation of the source code. Is it possible to include +two important digraphs hidden amongst frivolous usage, such that + + - one digraph breaks syntax highlighting in a useful way, like the example + demonstrated above, and + + - the other digraph isn't a real digraph, rather being something which breaks + the program if digraphs are converted with a simple search-and-replace? + +One possible example of the 'false' digraph would be embedding the characters +inside another token, perhaps a multi-part string split across multiple lines? +If a naive search-and-replace would convert the string into something +syntax-breaking, then the reader may avoid doing a digraph conversion before +reading the source, despite knowing such digraphs are there, and thus may be +tricked into believing lies from their syntax highlighter. + +I suppose that leads to the natural question: Do people typically do a +search-and-replace for digraphs when reading obfuscated code, or do they use a +more language-aware method? -- 2.20.1