Undifined but simple anti-decompiling instruction

TL;DR

UD2 is an x86 assembly instruction that simulates an invalid opcode and mostly used for testing purposes, but not only. Indeed it can be used by malware authors for example to disturbe the decompilation process of the malware.

UD2 instruction

UD2 is an x86 assembly mnemonic and stands for Undefined instruction. Only used for testing purposes, this instruction simulates the presence of an invalid opcode in the code and when executed raises an Invalid opcode exception.

Opcode	Mnemonic
0F 0B	UD2

Microsoft Visual C++ example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26


//https://fr-academic.com/dic.nsf/frwiki/1672713
#include <windows.h>
#include <iostream>

bool __ud2(void)
{
        __try{
                __asm{UD2}
        }
    __except(GetExceptionCode() == EXCEPTION_ILLEGAL_INSTRUCTION ?
             EXCEPTION_EXECUTE_HANDLER : EXCEPTION_CONTINUE_SEARCH)
        {
                return true;
        }
        return false;
 }

int main(void)
{
        if ( __ud2() )
                std::cout << "Exception d'opcode invalide rencontrée";
        else
                std::cout << "Exception d'opcode invalide non rencontrée";

        return 0;
}

Anti-decompiling purpose

By giving you the definition of UD2 instruction, I said that it’s only used for testing purposes, well I lied. It could also be used as an anti-decomiling technique. To explain this usage, let’s see an example of a simple program.

Simple program

The following program checks if the program was executed with a parameter or exits, if it’s the case, he first checks if the length of the string passed as argument is equal to 18 and then is equal to the string G0od_Byp4ss_0f_UD2. if the two conditions are not met he returns after printing a ‘Try again’ message.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21


#include <stdio.h>
#include <string.h>
#include <stdlib.h>



int main(int argc, char **argv){

    if(argc < 2){
        printf("Usage : %s password\n", argv[0]);
        exit(1);
    }
    if(strlen(argv[1]) == 18){
        if(strcmp(argv[1], "G0od_Byp4ss_0f_UD2") == 0){
            printf("Good boy.\n");
            exit(0);
        }
    }
    printf("Try again !\n");
    return 1;
}

Now let’s decompile our program, for this example I’m using Ghidra’s decompiler.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30


void main(ulong argc, ulong *argv)

{
    int32_t iVar1;
    int64_t iVar2;
    ulong extraout_RDX;
    char *pcVar3;
    uint64_t uVar4;
    ulong s1;
    ulong var_4h;

    pcVar3 = argv;
    if (argc < 2) {
        pcVar3 = *argv;
        printf("Usage : %s password\n", pcVar3);
        exit(1);
    }
    iVar2 = strlen(argv[1]);
    if (iVar2 == 0x12) {
        pcVar3 = "G0od_Byp4ss_0f_UD2";
        iVar1 = strcmp(argv[1], "G0od_Byp4ss_0f_UD2");
        if (iVar1 == 0) {
            puts("Good boy.");
            exit(0);
        }
    }
    puts("Try again !");
    uVar4 = 1;
    return uVar4;
}

We can see that the pseudocode generated by the decompiler is strongly similar to our original source code and this is because of two main reasons :

1- The program is very simple

2- The program doesn’t implement any anti reverse engineering method

“Protected” simple program

In order to not to be lynched by the reverse engineering community, i used the word “protected” in the title just to make the difference between the two programs, but in real cases this kind of protection is much more complexe than what I will show you. Now that I’ve got my back, we can continue…

Let’s take our simple program and throw an UD2 instruction in the middle of the source code. Here we will use the __asm keyword to include assembly instruction in our C code. Also, we will not carry about exception handling.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23


#include <stdio.h>
#include <string.h>
#include <stdlib.h>



int main(int argc, char **argv){

    if(argc < 2){
        printf("Usage : %s password\n", argv[0]);
        exit(1);
    }
    __asm("UD2;"
    );
    if(strlen(argv[1]) == 18){
        if(strcmp(argv[1], "G0od_Byp4ss_0f_UD2") == 0){
            printf("Good boy.\n");
            exit(0);
        }
    }
    printf("Try again !\n");
    return 1;
}

Finally after re-decompiling our program we end up with the following pseudocode

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


void main(ulong argc, ulong *argv)

{
    ulong var_10h;
    ulong var_4h;

    if (argc < 2) {
        printf("Usage : %s password\n", *argv);
        exit(1);
    }
    do {
        invalidInstructionException();
    } while( true );
}

We can see that instead of continuing the decompilation process the decompiler stopped after adding an infinit loop that calls invalidInstructionException() function. My personal explanation to this, is that during the optimization process of the decompiler, he detected an invalid opcode (UD2) and deduced that logically the program will ends with an invalid instruction exception, which is not false when it’s for testing purposes, but in anti reverse engineering usage, the author will mostly handle this exception in order to not to affect the program workflow.

“Protection” bypass

In order to bypass the “protection” in our simple program, we can simply patch it by replacing the UD2 instruction with two NOPs instruction.

Before patching

...
0x000011f8      call    printf                              ; sym.imp.printf ; int printf(const char *format) ; sym.imp.printf
; int printf("Usage : %s password\n")
0x000011fd      mov     edi, 1                              ; int status
0x00001202      call    exit                                ; sym.imp.exit ; void exit(int status) ; sym.imp.exit
; void exit(38161477)
0x00001207      ud2
0x00001209      mov     rax, qword [rbp - 0x10]
0x0000120d      add     rax, 8
0x00001211      mov     rax, qword [rax]
0x00001214      mov     rdi, rax
0x00001217      call    strlen
...

After patching

...
0x000011f8      call    printf                              ; sym.imp.printf ; int printf(const char *format) ; sym.imp.printf
; int printf("Usage : %s password\n")
0x000011fd      mov     edi, 1                              ; int status
0x00001202      call    exit                                ; sym.imp.exit ; void exit(int status) ; sym.imp.exit
; void exit(38161477)
0x00001207      nop
0x00001208      nop
0x00001209      mov     rax, qword [rbp - 0x10]
0x0000120d      add     rax, 8
0x00001211      mov     rax, qword [rax]
0x00001214      mov     rdi, rax
0x00001217      call    strlen
...