Leveraging glibc in exploitation - Part 3: Defenses

In the previous blog post in this series, we examined the memory layout of a program at runtime on Linux and how glibc fits in with that model. In particular, we looked at the call stack, how it can be abused to leak information, and its relationship with ASLR. In this post, we will look at a purposely-vulnerable example program and its built-in defenses in preparation for hacking it.

Posts in this series

Introducing the “big-roi” server example

So far, we have examined some relatively low-level computing concepts, such as the call stack layout, the delicate placement of glibc data within that layout, and rules about how that data can be accessed. To put these concepts into practice, I have written a purposely-vulnerable networked service named “big-roi” (Bad Insecure Grotesquely Really Obviously Insecure).

This GitLab repository contains the example’s source code, a compiled executable, and the objdump disassembly, which can be used to quasi-authenticate the executable. If you opt to compile the program, it is unlikely function addresses and other compiler-related properties will match those discussed in these blog posts. The executable will still work and be vulnerable to the attacks discussed in this series, but it will make following along with code snippets difficult.

When compiling or running big-roi, I recommend doing so in a Ubuntu 20.04 environment, which is what I used when creating big-roi. This increases the likelihood that gcc will produce an executable with the same mitigations applied. As we discussed in part two, glibc fingerprinting tools might not know about glibc libraries compiled for less-common Linux distributions. Using a popular distribution like Ubuntu decreases the chances of that.

In this imaginary scenario, a developer was experimenting with implementing a lightweight file server for Linux-based OSes with optional password protection. In the process of attaining “high ROI and velocity”, they implemented an application with some significant security challenges.

The application takes two arguments: a TCP port to listen on, and a file path to share. The user can optionally set a password by generating a bcrypt password string, and setting an environment variable equal to the bcrypt string. For example:

# Note: This requires openssl.
# Refer to "man openssl-passwd" for more information.
$ export PASSWORD_BCRYPT=$(openssl passwd -5)
# <Type in a password and confirm it>
$ echo 'keith says to forget about it!' > /tmp/secret-data
$ ./big-roi 6666 /tmp/secret-data

Users can retrieve the file by connecting to the process over TCP and sending a password:

# Note: Make sure to not include a trailing newline character.
$ printf '%s' 'gfy' | nc 127.0.0.1 6666
incorrect password: gfy
$ printf '%s' '<actual password>' | nc 127.0.0.1 6666
keith says to forget about it!

Our goal in the next post will be to bypass the program’s password check by leveraging glibc. Before we can do that however, we need to develop a better understanding of the defenses offered by the compiler. These defenses will constrain our exploit’s capabilities and design. Feel free to look at the source code - we will be covering that in the next blog post. For now, let’s focus on the executable included in the git repository.

Discovering security mitigations

It is common for hacking challenges to disable security mitigations to make exploitation easier, or to demonstrate a particular attack. One goal I have for this example is to exploit the vulnerable program without disabling any security mitigations. That raises a few questions:

What kind of security mitigations are there to begin with?
What is the relationship between any security mitigations and the final executable file that gcc spits out?

The most relevant mitigation is probably ASLR, which we covered in part two. ASLR is provided by the operating system via the Linux kernel. In this scenario, we will assume the computer running the vulnerable program has ASLR enabled.

Discovering security mitigations added by the compiler is a bit more involved. gcc itself can provide some of this information by specifying the -v argument:

$ gcc -o /tmp/x -v -g big-roi.c -lcrypt
# ...
/usr/lib/gcc/x86_64-linux-gnu/9/cc1 -quiet -v \
    -imultiarch x86_64-linux-gnu big-roi.c -quiet -dumpbase big-roi.c \
    -mtune=generic -march=x86-64 -auxbase big-roi -g -version \
    -fasynchronous-unwind-tables -fstack-protector-strong -Wformat \
    -Wformat-security -fstack-clash-protection -fcf-protection \
    -o /tmp/ccjIxDk4.s
# ...

Unfortunately, the output is pretty noisy (I cleaned it up quite a bit here for readability). The -f<feature> arguments and values provide some insight into security mitigations applied by gcc. This is of course not very useful when working with an existing executable or when the developer keeps the log output from gcc secret.

For more a more succinct analysis, I recommend using either rabin2 (part of radare2) or rz-bin (part of rizin) for this task. At the time of this writing, radare2 is more readily available from package managers. The rizin toolkit is a recent fork of radare2 that aims to improve code quality and security - both of which are very important for reverse engineering and binary exploitation.

Both tools report on an executable’s attributes, including security mitigations, when supplied with the -I argument. Here is what rz-bin says:

$ rz-bin -I big-roi
[Info]
arch     x86
# ...
relro    full
# ...
canary   true
PIE      true
RELROCS  true
NX       true

Based on the output above, we can see the executable has the following mitigations applied:

NX - No-execute (in this context, specifically non-executable call stack)
canary - Call stack canaries
PIE - Position independent code / position independent executable
relro full - Relocation read-only in full mode

In the next sections, we will take a look at these mitigations, and what they mean for developing an exploit.

No-execute (NX)

One common security mitigation you may have heard of is no-execute (NX). Like many things in computing, this term is vague, overloaded, and used inconsistently. NX, or more generally “executable space protection”, refers to marking chunks of memory as non-executable. ¹ In the output of rabin2 and rz-bin, “NX” refers to marking only the call stack as non-executable.

From a hacker’s perspective, an executable stack makes it easier to pivot to code they control. Imagine if we overflow a stack-based buffer with data that also contains executable code. If we can find the address of the buffer, or locate a CPU instruction that jumps execution to the buffer, we could just run whatever code we want as a part of the buffer overflow attack. This would simplify the attack into a few steps:

Create a blob of data that overflows the buffer which contains:
- Executable code, such as exec‘ing /bin/sh
- A trailing pointer to CPU instructions that jump execution into the call stack buffer (this overwrites the saved return instruction pointer of the function, which happens to be stored just after the buffer variable)
Write the blob to the vulnerable process
The vulnerable function runs and overwrites the saved return instruction pointer with a hacker-controlled address (this new address points at CPU instructions that jump into the same buffer that triggered the buffer overflow)
Execution jumps to the buffer when the vulnerable function returns
/bin/sh runs

Such an attack requires less work on the hacker’s part to design and test, making the exploit simpler and more reliable. While a non-executable stack does not stop stack-based overflows, it makes exploiting them more complicated. A non-executable stack forces the hacker to build their initial exploit from code contained within the program or from dynamically linked libraries.

In effect, the hacker can no longer immediately execute code of their choosing. They must work with whatever code happens to be provided by the vulnerable program to pivot to hacker-injected code. This is known as “return-oriented programming” and we will be discussing it in part four.

The minutia of executable space protection

The history of executable space protection is, unfortunately, somewhat lost to time. In modern computers, the CPU’s memory management unit (MMU) is responsible for the translation of virtual memory to physical memory in the form of memory pages. The MMU also manages memory permissions (e.g., read, write, and execute) and, in concert with the OS, access to memory pages. This is known as “memory protection”. ² One of the earliest examples of a system resembling a MMU with memory protection capabilities is the 1960s era Burroughs B5000 mainframe. It featured a memory tagging architecture which, according to Alastair J.W. Mayer, made it “impossible to execute data, or to use code as operands.” ³ This starkly contrasts with modern processors such as x86.

Fast forward to 1996 - Aleph One demonstrated how stack-based buffer overflows could be utilized to execute exploit code in stack memory on x86 CPUs. ⁴ Following Aleph One’s Phrack article, Solar Designer published a patch for the Linux kernel that implemented non-executable call stack memory on 32-bit x86 CPUs. ⁵ One limitation Solar Designer noted about their patch was that other data segments (bss, heap) still remained executable and would remain targets for injected exploit code.

Several months later, Rafal Wojtczuk demonstrated that stack-based buffer overflows could still be exploited by abusing this limitation, among other novel strategies that avoided executing code in call stack memory. ⁶ In response to Wojtczuk, Solar Designer noted that x86 CPUs of the time did not support marking memory as non-executable at the page-level. ⁶ As a result, other data-type memory segments remained executable. It was not until 64-bit x86 CPUs that page-level execution permission was supported in the form of the “NX bit”. ⁷

In modern Linux kernels, programs and libraries can still be compiled with an executable call stack even if the CPU supports disabling page-level execution. Doing so sets a flag in the ELF file which tells Linux that the program or library requires an executable stack. ⁸ Modern versions of gcc default to compiling with a non-executable call stack. This can be overridden by specifying the -z execstack argument to gcc.

An executable stack can also be inadvertently enabled if a program relies on function trampolines. ⁸ According to the GCC Internals manual, a trampoline can be introduced if a program relies on nested functions:

GCC has traditionally supported nested functions by creating an executable trampoline at run time when the address of a nested function is taken. This is a small piece of code which normally resides on the stack, in the stack frame of the containing function. The trampoline loads the static chain register and then jumps to the real address of the nested function. ⁹

Users can view and control the executable stack flag of an existing program or library using the execstack program. ⁸

Call stack canaries

A call stack canary is a secret chunk of data that is inserted into the call stack between the programmer’s local variables and process state by gcc. The compiler also inserts code into the program that verifies the canary value when a function finishes execution. If the verification code finds that the canary has been modified, it writes a message to syslog (only if no TTY is available) and attempts to exit the process by first raising the abort signal (SIGABRT) followed by calling _exit with a status code of 127. ¹⁰ ¹¹ This means we cannot simply overrun a stack buffer and overwrite process state saved on the “other side” of the canary.

gcc’s original implementation of this feature was called “StackGuard”. It is covered in detail by Wagle and Cowan in their 2003 article “StackGuard: Simple Stack Smash Protection for GCC”. ¹²

To illustrate this functionality, let’s imagine we have a function that allows for a stack-based buffer overflow. The buffer variable will be named buf, will be sized 64 bytes, and will be “zeroed” out. Here is what happens before and after buf is overflown.

Note, if the direction of the call stack does not make sense in the diagram below, I recommend checking out Eli Bendersky’s blog on the subject, which explains how the call stack is placed “head down” in Intel x86. ¹³

# (1) Stack frame of         ---|--> (2) Stack frame after buf
#     vulnerable function       |        is overflown with
#     right before buf          |        80 x "A" and a hacker
#     is written to:            |        controlled address:
# +------------------------+    |    +------------------------+
# | saved return           |    |    | saved return           |
# | instruction pointer    |    |    | instruction pointer    |
# | (points to             |    |    | (now points to         |
# | to previous function)  |    |    | hacker-specified code) |
# | 0x0000555555555786     |    |    | 0x000000000bad1dea     |
# +------------------------+    |    +------------------------+
# | saved rbp (8 bytes)    |    |    | overwritten rbp addr   |
# | 0x00007fffffffe550     |    |    | 0x4141414141414141     |
# +------------------------+    |    +------------------------+
# | stack canary (8 bytes) |    |    | overwritten canary     |
# | 0x00fe1337b0bafe77     |    |    | 0x4141414141414141     |
# +------------------------+    |    +------------------------+
# | buf (64 bytes)         |    |    | buf (64 bytes)         |
# | 0x00 x 64              |    |    | 0x41 x 64              |
# +------------------------+    |    +------------------------+
# | (lower memory addrs)   |    |    | (lower memory addrs)   |

The positioning of the canary value guarantees that any buffer overflow that overwrites the process state (in this case, either the saved rbp register address or the saved return instruction pointer) will also overwrite the canary value. When the function returns, the verification code will notice the modified canary, and exit the process - thus preventing the hacker from subverting the process’s control flow.

Generally speaking, the stack canary will be CPU-bits long - e.g., on a 64-bit CPU it will be 64 bits, or 8 bytes, of data. The first byte is usually set to 0x00 (null), which theoretically prevents a hacker from using a function like strlen (which continues forever until it finds a null byte) to read data past a non-null-terminated stack buffer. Ordinarily, such a bug would reveal the stack canary value and other sensitive process state. As we saw in part two, such a mistake is easy to make, and gcc’s addition of a null terminator by itself can be the difference between a working program and an information leak.

The remaining bits of the canary consist of random data that do not change throughout the program’s execution (at least, as of the current implementation of gcc). ¹⁴ The canary is also copied to a forked process, which means hackers can potentially abuse a forking program to brute-force the secret value by changing one byte at a time and checking if the process has exited using a crash oracle (e.g., a TCP reset). A non-forking process would prevent such an attack, as a new canary value would be generated each time the program restarts.

Position-independent code (PIC)

Position-independent code (PIC - also known as position-independent executable, or PIE) allows code to be loaded in memory regardless of the address. ¹⁵ While not required for ASLR, PIC enables a more thorough ASLR implementation. Specifically, it allows for a library or executable’s segments to be mapped to random addresses. The result is that sections contained in an ELF’s segments can be mapped to random addresses.

This can be seen by compiling a program with gcc using the -no-pie argument (modern versions of gcc default to compiling PIC/PIE) and comparing the executable segment’s virtual addresses (where it will be mapped at runtime) to /proc/<pid>/maps:

$ gcc -o /tmp/big-roi -no-pie -g big-roi.c -lcrypt
$ /tmp/big-roi 6666 /tmp/foo &
[1] 371
$ rz-bin -SS /tmp/big-roi
[Segments]
paddr      size  vaddr      vsize align  perm name
-----------------------------------------------------------
0x00000040 0x2d8 0x00400040 0x2d8 0x8    -r-- PHDR
0x00000318 0x1c  0x00400318 0x1c  0x1    -r-- INTERP
0x00000000 0xbb8 0x00400000 0xbb8 0x1000 -r-- LOAD0
0x00001000 0xa61 0x00401000 0xa61 0x1000 -r-x LOAD1
0x00002000 0x408 0x00402000 0x408 0x1000 -r-- LOAD2
0x00002e00 0x310 0x00403e00 0x330 0x1000 -rw- LOAD3
# Notice the overlap of the segments' addresses (the "vaddr"
# column) from above with the executable's mappings at runtime:
$ cat /proc/371/maps
00400000-00401000 r--p 00000000 08:01 2252012 /tmp/big-roi
00401000-00402000 r-xp 00001000 08:01 2252012 /tmp/big-roi
00402000-00403000 r--p 00002000 08:01 2252012 /tmp/big-roi
00403000-00404000 r--p 00002000 08:01 2252012 /tmp/big-roi
00404000-00405000 rw-p 00003000 08:01 2252012 /tmp/big-roi

Re-compile as position independent code and compare the segments’ addresses with the runtime mappings:

$ rm /tmp/big-roi; gcc -o /tmp/big-roi -g big-roi.c -lcrypt
$ /tmp/big-roi 6666 /tmp/foo &
[1] 381
$ rz-bin -SS /tmp/big-roi
[Segments]
paddr      size  vaddr      vsize align  perm name
-----------------------------------------------------------
0x00000040 0x2d8 0x00000040 0x2d8 0x8    -r-- PHDR
0x00000318 0x1c  0x00000318 0x1c  0x1    -r-- INTERP
0x00000000 0xce8 0x00000000 0xce8 0x1000 -r-- LOAD0
0x00001000 0xa71 0x00001000 0xa71 0x1000 -r-x LOAD1
0x00002000 0x410 0x00002000 0x410 0x1000 -r-- LOAD2
0x00002cc8 0x348 0x00003cc8 0x368 0x1000 -rw- LOAD3
# This time, check out how the segments' addresses from above
# are completely different from the runtime memory mappings:
$ cat /proc/381/maps
55e36aca1000-55e36aca2000 r--p 00000000 08:01 2252012 /tmp/big-roi
55e36aca2000-55e36aca3000 r-xp 00001000 08:01 2252012 /tmp/big-roi
55e36aca3000-55e36aca4000 r--p 00002000 08:01 2252012 /tmp/big-roi
55e36aca4000-55e36aca5000 r--p 00002000 08:01 2252012 /tmp/big-roi
55e36aca5000-55e36aca6000 rw-p 00003000 08:01 2252012 /tmp/big-roi

Relocation read-only (RELRO)

In computing, the mapping of a function symbol to a memory address is known as a “relocation”. ¹⁶ Dynamically linked programs rely on relocations to execute external functions. Without a relocation resolution capability, dynamically linked code would need to make bad assumptions about where it can find external functions. David Tomaschik goes into fine detail about this and RELRO in his excellent “GOT and PLT for pwning” blog post. ¹⁷ I will be summarizing his post in this section. I recommend reading David’s blog post for more detail on this subject.

In a dynamically linked ELF file, the compiler substitutes calls to external functions, variables, and other external data with pointers to the Procedure Linkage Table (PLT). The PLT is located in the .plt section of the executable. When an external function is first called, the program executes code stored in the PLT. This “stub” code then checks the Global Offset Table (GOT) for the next address of code it should execute. The GOT is located in the executable’s .got.plt section. Without RELRO, both of these sections are mapped into memory with read-write permissions.

The very first time this happens, the GOT already contains a pointer back to the PLT stub code that got us here (well, a few instructions into the stub code to prevent an infinite loop). The PLT code then resumes and triggers the dynamic linker to lookup the address of the external function. The resolved address is then written to the GOT - overwriting the pointer that was pre-populated there.

Now the GOT entry for the function contains the actual address of the external function. From this point forward, the PLT stub will now jump execution to the external function simply because that is where the GOT entry points.

This is illustrated below (note, this is a simplified drawing):

// Here is what happens the first time printf is called:
printf("foo\n");
// (1)-> printf@plt (2)-> printf@got == printf@plt+6
//           |                |
//           |                V
//           + <-(3)----------+
//           |
//           V
//      linker finds
//      printf at
//      0x00012345
//      and updates
//      the GOT
//           |
//          (4)
//           |
//           V
//       printf@0x00012345("foo\n");

// And here is what happens for all future printf calls:
printf("bar\n");
// (1)-> printf@plt (2)-> printf@got == 0x00012345
//                            |
//                            V
//                        printf@0x00012345("bar\n")

Relocation read-only (RELRO) comes in two forms: “partial” and “full”. The “full” mode works by having the dynamic linker look up the addresses of all external code when a program starts up, rather than doing so when the external code is executed or accessed (the latter is known as “lazy” binding). The mapped sections where the resolved addresses are written to is then marked read-only. Full RELRO appears to be the default behavior in modern versions of gcc.

Partial mode, as its name implies, is less effective than the full mode. Even with partial RELRO, the section where resolved addresses are saved remains writable for the duration of the program’s execution.

To summarize, full RELRO prevents us from overwriting any data in the GOT with pointers to code that we want to run instead.

Summary

In this post, we took a quick look at an example program, the security mitigations applied to it by gcc, and what those mitigations mean for us. This includes:

NX and executable space protection
A little bit of history about executable space protection in x86
Call stack canaries, their inner workings, and their limitations
Position-independent code
Relocation read-only and how it works

In part four, we will finally apply everything we have learned to hack the program and bypass its password protection.

References

wikipedia.org. 2022, July 8. “Executable space protection”. ↩︎
wikipedia.org. 2022, May 1. “Memory protection”. ↩︎
Mayer, Alastair J.W.. 1982, June. “The Architecture of the Burroughs B5000 - 20 Years Later and Still Ahead of the Times?”. Note 1: Also at smecc.org Note 2: A huge thank you to Thomas Strömberg for pointing out the B5000’s relevance to non-executable call stack. I also recommend reading the Burroughs large systems and Memory management unit Wikipedia pages on these respective subjects for more context. ↩︎
One, Aleph. 1996, November 8. “Smashing The Stack For Fun And Profit”. ↩︎
Designer, Solar. 1997, April 12. “Linux kernel patch to remove stack exec permission”. Note: Sent Sat, 12 Apr 1997 13:03:07 -0300, well done. ↩︎
Wojtczuk, Rafal. 1998, January 30. “Defeating Solar Designer’s Non-executable Stack Patch”. ↩︎
wikipedia.org. 2022, February 15. “NX bit”. ↩︎
linux.die.net. Accessed: 2022, April 10. “execstack(8) - Linux man page”. ↩︎
gcc.gnu.org. Accessed: 2022, August 2. “18.11 Support for Nested Functions - GNU Compiler Collection (GCC) Internals”. ↩︎
GNU Compiler Collection. 2005, July 2. “static void fail() - libssp/ssp.c (version 12.1.0)”. Note: Similar code can be found in glibc, though I am not entirely sure what its purpose is: glibc: debug/stack_chk_fail.c ↩︎
GNU Compiler Collection. 2021, August 17. “#define abort() - gcc/tsystem.h (version 12.1.0)”. ↩︎
Wagle, Perry and Cowan, Crispin. 2003. “StackGuard: Simple Stack Smash Protection for GCC”. ↩︎
Bendersky, Eli. 2011, February 4. “Where the top of the stack is on x86”. ↩︎
GNU Compiler Collection. 2005, July 2. “static void __attribute__ ((constructor)) __guard_setup () - libssp/ssp.c (version 12.1.0)”. ↩︎
wikipedia.org. 2022, January 2. “Position-independent code”. ↩︎
wikipedia.org. 2021, September 25. “Global Offset Table”. ↩︎
Tomaschik, David. 2017, March 19. “GOT and PLT for pwning”. ↩︎

2022-08-16

control.rip