Blog

Code obfuscation

Stanisław Kubiak 07.06.2019

Have you ever wondered, dear reader and dear reader, how to make your app safer? And it won’t be avoiding unforeseen crashes, incorrect data parsing <here insert=”” any=”” error=”” type=””>or.</here> Rather, I mean making your app resistant to external attacks. Before these terrible crackers. So! It’s about code fuming. Or, more in Polish, code opalution.

Recently, we wrote this and that about security – and more specifically about how to check if our password has not leaked in the internet.
In this post, we will consider what needs to be done to ensure that the source code of our application does not get into the internet.

At the very beginning, let’s define what code is actually a dummy. The definition is provided by several. In general, obfuscation of code – that is, code obfuscation – is to transform the application in such a way that it is more difficult to understand (in the extreme case: break) the next steps of the assembler code.
These changes can be applied at different stages. And yes, you can enter them in your code:

Source
Byte,
Machine.

Please note that code fumes – as well as any other security – do not give you a 100% guarantee of security. It only introduces an obstacle to be overcome by the attacker, which in practice can only delay the moment of breaking the application.

Returning to the substance , I have already mentioned that we can make changes at different stages of the programme. I will focus on techniques related strictly to source code. I will modify a simple program that I will try to make minimally safer in a few steps.

The following application code:

<iostream>#include
<string>#include

std::string encrypt(std::string s)
{
	std::string out;
	for (auto i = 0u; i < s.size(); ++i)
		out += static_cast <char>(s ^ 0xDD);
[i]	return out;
}

static auto pass = encrypt("okon");

int main()
{
    std::string s;
    std::cout < "Password, please: ";
    std::cin > > s;
    if (encrypt(s) == pass)
    {
        std::cout < "Hello, Sir!n";
        return 0;
    }
    else
    {
        std::cout  < "Go away, peasant!n";
        return 1;
    }
} </char> </string> </iostream>

We will evaluate the „security” of the program using IDA (Interactive DissAselbmer) – a tool used for playing with reverse engineering.
Note! – In fact, I will focus only on assessing the complexity of the code blocks generated by IDA. This will
allow us, very loosely, to assess whether the obfuscation methods actually work. Therefore, please note that in this post we will not look at practical security verification of specific code dilution techniques.

Substitution Names

To begin with, the easiest of techniques – renaming (variables or methods). This transformation does not have much over-translation in compiled languages (e.g. C++), while it is more likely to be used in scripting languages (e.g. C++). JavaScript). It involves changing the name, e.g. verifyPin() to gdkfglndv030uigujnd() – i
sn’t that simple? Renaming can also be derived from the use of encrypted variable names or methods.

Unfortunately, this method, while trivial to apply, unfortunately does not give much in terms of security.

Flow control

A rather enigmatic-sounding transformation, isn’t it? Basically, each program can be broken down into certain modules, in which successive operations are performed, one by one, without jump instru
ctions. Something like a segment in which we include a given part of the functionality (e.g. logical mapping of the password entry and vali
dation process). Analyzing the program, you can get a graph showing the transitions between these modules – this can already lead to guessing how the application works (that is, how password validation affects user authorization).

We should ask ourselves, how can we modify our source code so that the generated blocks do not reveal our intentions? The afor
ementioned analysis is usually based on the search for specific patterns, sequences. Therefore, our task should be to disturb them.
For example, blocks of code that follow each other side by side (hence the flattening of the flow – control flow flattening).

Ideally suited to this design, which uncle Bob warns against, warn good scouts and Zbigniew Centipede – that is, eg goto or incomprehensible stunts using the switch.

Let’s look at the changes in the encrypt method, from our example

std::string encrypt(std::string s)
{
    std::string out;
    std::size_t i;
    int dummy = 1;

while (true)
    {
        switch (dummy)
        {
        case 1:
        {
            i = 0;
            dummy = 2;
            break;
        }
        case 2:
        {
            if (i < s.size())
            {
                out += static_cast <char>(s ^ 0x[i]DD);
                ++i;
            }
            else
            {
                dummy = 3;
            }
            break;
        }
        case 3:
        {
            return out;
        }
        }
    }
}</char>

Functionally, the method still does the same. Performs an uncomplicated encoding operation of the set data at the input. With the
fact that the flow diagram is significantly different from the initial one.

Above I included the base version of the encrypt method. Its control flow scheme is trivial. Let’s see how the same method looks after minor changes.

Encrypt method subjected *to control flow flattening*

The effect is noticeable: the flow is currently broken between several blocks. The fact is that our changes still do not pose a serious security for attackers, so let’s spoil ~~the code~~ we keep running!

Further dilution

Let’s take a look at the most ordinary addition of dead code. Code that does nothing and will never be called. Or one that functionally does nothing.

Adding such fragments is designed to force the attacker to spend more time on our program to discard assembler blocks that ultimately have no effect on the entire.

In addition, you can add blocks of code that will be executed and will not make any changes to the operation of the application, e.g. potentially perform complex mathematical operations on the variable data, ultimately leaving them in their original form. We can also use asembler operations, which literally do nothing – nop.

Unfortunately, this particular technique has some
drawbacks. Primo – we have to fight compiler optimizations. Modern compilers can perform advanced optimizations, they can reorganize the structure of code (depending on how aggressive optimizations we allow) can finally remove pieces of code that do nothing (but! – there is a rescue: with the weaning can come volatile). And all this so that we get as a result of a smaller binary and/or a faster running appl
ication. Secundo – the more code the larger the binary. Under certain specific conditions, this can be a really serious problem.

So, let’s say we’ve added more blocks of code. We coped with compiler optimizations. The size of the binary file is not scary to us. Then let’s go one step further – let’s introduce some randomness among the assembler blocks of our application.

Let’s look at the changes made to the encryption method – for the record, we added: blocks of code that do not execute, blocks of code that do not change anything, and random placement of segments among themselves.

std::string encrypt(std::string s)
{
    std::string out;
    std::size_t i;
    int dummy = 1;

while (true)
    {
        switch (dummy)
        {
        case 3:
        {
            return out;
        }
        case 2:
        {
            if (i < s.size())
            {
                out += static_cast <char>(s ^ 0x[i]DD);
                ++i;
            }
            else
            {
                dummy = 3;
            }
            break;
        }
        case 1:
        {
            i = 0;
            dummy = 2;
        }
        case 42:
        {
            and *= 150;
            break;
        }
        case 1337:
        {
            return "dfjdjlfd*#&$@*#0832740238742";
        }
        case 0:
        {
            dummy = 1;
            break;
        }
        }
    }
}</char>

Again, let’s verify the changes with our advanced in-app security assessment method.

Changed order of execution of subsequent blocks in encrypt method

Apparent conditions

Another technique you might want to know when you’re diluting code is to use seemingly complex expressions that determine the execution of a program. These expressions return a value known in advance; in combination with if-else instructions, we get an apparent branching of the execution of our application. This is, of course, an opaque predicate (unfortunately I am unable to find an official translation into Polish, which is why I remain with the English name).

Let’s try to apply it in our example. Here is a new method that we will introduce into our program with its later use in main().

bool verify(int a)
{
    volatile int b = a;
    a = 150 * 28;
    volatile int c = a;
    a ^= 1336;
    return (b);
}

int main()
{

std::string s;
    std::cout < "Password, please: ";
    std::cin > > s;
    if (verify(150))
    {
        if (encrypt(s) == pass)
        {
            std::cout < "Hello, Sir!n";
            return 0;
        }
        else
        {
            std::cout < "Go away, peasant!n";
            return 1;
        }
    }
    else
        return 0xABCD;
}

What else

You may be tempted by further techniques for code dilution, some of which are:

Recursion
modifying the layout of the processed data in such a way that their type is not immediately obvious,
composition of the data into larger structures hiding the actual type ,
using external sources (entropy) to control flow in the application,
the use of „god object”, that is, an object that realizes many – if not all – functionalities of a given component,
creating very long methods, breaking with the rules of writing pure code (the method should not be longer than 10 LOC),
many others.

Conclusion

The issue of code dilution is definitely a complex process. It requires a lot of creativity, knowledge of the rules of operation of the compiler, often advanced design of a given programming language, specific features of operating systems, knowledge of attack patterns, etc.

When you proceed with code dilution, you should also keep in mind a few problems that result from its application.
Namely:

Application performance: Using obfuscation techniques, we often consciously implement suboptimal solutions, or make it difficult for the compiler to implement specific code changes to speed it up; on the other hand, when creating applications with GUI, for example, we care about a high number of FPS, so it is worth considering, for example, the clouding of code only in sensitive areas of the application, and those that are critical (in terms of performance) leave unchanged,
debugging our own code can become problematic,
the size of the binary file may increase, which can be of colossal importance for embedded systems, for example,
it will certainly be more difficult to maintain the code that has undergone these transformations.

I strongly encourage you to explore code dimming. This entry serves only to signal the existence of this topic, I did not focus on the details, I did not use the IDA in an advanced way, I did not delve into the assembler code, I did not touch on the methods of obfuscation at the stage of intermediate or machine code.