Leveraging glibc in exploitation - Part 1: What is glibc?

The GNU C Library (glibc) is an open-source implementation of the C standard library that is primarily found on Linux-based operating systems. It provides a powerful set of APIs that simplify interacting with operating systems, as well as functions and code for creating programs. From a hacker’s perspective, this functionality can be repurposed to extend an exploit’s capabilities and viability. In this series, we will explore leveraging glibc to exploit a vulnerable program on a x86 64-bit CPU.

Special thanks and dedication

I would like to thank Chris Marget and Thomas Strömberg for their assistance in writing this series. Thomas spent many hours reviewing my work and providing me with feedback while in the middle of a cross-country move. He also pushed me to research topics that needed more depth - and it really shows in the final piece. Chris also reviewed this series. His willingness to listen to my crazy ideas and rants is deeply appreciated.

Thank you both for your help.

This series is dedicated to my late friend Alex Bedard, who was one of the first people to introduce me to the world of computing. His guidance and kindness gave me the confidence to explore on my own. My great journey into computing would not have happened without his friendship.

Overview

If hacking is the process of changing how a system works by leveraging its capabilities in unexpected ways, then the GNU C library can be a good source of such capabilities. Hackers are often limited to constructing their exploit from code contained in the vulnerable program itself. Code provided by external libraries can also be utilized by hackers if a vulnerable program depends on them. Even if a program uses only a fraction of a library’s functionality, the architecture of dynamic linking makes it possible for an exploit to use any code contained in said library.

The following posts will examine basic concepts involved in leveraging a dynamically linked library like glibc in an exploit, including demonstrating an exploit on an example program:

These posts are intended for people new to binary exploitation and C programming. That said, you may still find this series helpful even if you do not fit in those categories. This topic is surprisingly poorly documented despite being relatively well understood.

Throughout these posts, I will reference some basic examples of shell commands and C code. If you would like to follow along, I recommend using a Linux VM or a Docker container. I used Ubuntu 20.04 with the gcc and gdb packages.

What is the C standard library?

The general purpose of the C standard library is to provide programmers with a standardized set of programming interfaces for writing a program in C. Operating systems provide their own diverse set of interfaces for interacting with various components like stateful storage, networking, and threads of execution. The C standard library attempts to simplify this by providing a stable set of programming interfaces that abstract the underlying OS-specific interfaces and code.

Confusingly, many people in the hacking community will refer to the GNU C library as “libc” and “C library”. The GNU C library (glibc) is only one implementation of the C standard library; however, it is the most common one you will find in Linux distributions. A notable exception is Alpine Linux, which uses musl libc.

In binary exploitation, it is critical to understand exactly which implementation you are working with. Failing to account for details like the C standard library implementation, build, or version will result in undesirable behavior such as crashing the vulnerable program. We will take a look at these nuances in part two.

Programmers can also bypass the C standard library and consume operating system programming interfaces directly. Such interfaces are generally referred to as “system calls” or “syscalls”. They are ordinarily consumed by staging the desired syscall’s identifier (an integer) and input data in memory according to the CPU’s calling convention, which is defined in the Application Binary Interface (ABI). ¹ ²

When glibc becomes relevant to hacking

The C language’s primary appeal is that it compiles human-readable source code to machine code, which is consumed directly by the CPU. However, C does not provide any guardrails to prevent a programmer from making memory management mistakes. When such mistakes are made, the programmer can inadvertently alter the internal state of their program. A classic example of this is creating a fixed-size variable to hold data (such as an audio file) and writing data to the variable without checking if the data is small enough to fit in the variable’s allocated memory.

These mistakes result in what is called “undefined behavior”. ³ Ironically, the behavior can be defined by studying the mistake when it occurs in the real world. A hacker can use this behavior to subvert the control flow of a program. The art of discovering and manipulating mistakes that result in undefined behavior is known as binary research and binary exploitation respectively.

Continuing from the previous example, a hacker can learn where the audio file data is stored in memory relative to the saved memory address of the last executed function. Using this information, they can then craft an audio file that is not only too large for the variable but also contains a pointer to another function’s memory address.

When this new audio data overflows the variable’s allocated space, it overwrites the program’s internal state, including the memory address of the function to return to. Finally, when the vulnerable function returns, it returns to the hacker-specified function. Just like that, a simple audio file player can now do anything from open the calculator application, to run a /bin/bash shell.

The process of pivoting to hacker-controlled code is not always straightforward and hinges on several factors:

“Mitigations” provided by the C compiler and the operating system
The existence and usability of information leaks about process state, if any exist at all
The availability of code that can be reused by the hacker to make their exploit viable, including:
- Any other code included in the vulnerable program itself
- External libraries that the program relies on (such as glibc)
- External programs and services
Other constraints imposed directly or indirectly by the design of the vulnerable code within the program

(This is not an exhaustive list, but is ordered in what I think should be considered first to last.)

glibc can become crucial to the success and reliability of an exploit due to these constraints. When a program relies on an external library, the entirety of the library is available to the program when it runs and, by proxy, any exploit code injected into the program. This potentially makes developing and executing an exploit easier and more reliable.

Finding the C library on the filesystem

Below, we have a C program that starts an infinite loop, and then sleeps for a few seconds in each of the loop’s iterations.

// loop.c

#include <stdio.h>
#include <unistd.h>

int main() {
    while(1) {
        sleep(5);
    }
}

Go ahead and compile the program using the GNU Compiler Collection (gcc):

$ gcc -o loop loop.c

There are several ways to discover which C library a program depends on (if any). The safest and most portable method involves combining objdump, ldconfig, and the operating system’s package manager. objdump’s -p argument tells it to write “information that is specific to the object file format” to stdout (the objdump documentation refers to this information as “private headers”). ⁴ Since loop is an ELF file, this output will include the ELF-specific “dynamic” section which specifies libraries that are required by the program. If objdump is unavailable, the readelf program with the -d argument will output just the dynamic section. ⁵

The dynamic section is documented in the “Dynamic Linking” chapter of the “Tool Interface Standard (TIS) Portable Formats Specification” document. ⁶ The _DYNAMIC symbol labels the ELF section that contains an array of “Dynamic Structures”, each of which contains different “array tags”. The DT_NEEDED tag specifies an array element that “holds the string table offset of a null-terminated string, giving the name of a needed library”. ⁶ Both objdump and readelf will annotate such array entries with the string NEEDED, which allows us to simply search for that string with grep:

$ objdump -p loop | grep NEEDED
  NEEDED               libc.so.6
# Alternatively with readelf:
$ readelf -d loop | grep NEEDED
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]

The ldd program provides similar information, including the resolved file paths of any libraries. Unfortunately, ldd has a poor security track record. According to the manual page for ldd on Linux:

[S]ome versions of ldd may attempt to obtain the dependency information by attempting to directly execute the program, which may lead to the execution of whatever code is defined in the program’s ELF interpreter, and perhaps to execution of the program itself. (In glibc versions before 2.27, the upstream ldd implementation did this for example, although most distributions provided a modified version that did not.)

Thus, you should never employ ldd on an untrusted executable, since this may result in the execution of arbitrary code. ⁷

This issue was demonstrated by Peter Krumins in their 2009 blog post: “ldd arbitrary code execution”. ⁸ In 2018, Ilya Smith demonstrated a different method of obtaining arbitrary code execution with ldd. In Smith’s bug report, they noted: “[the attacker] can re-mmap ld library code with [the attacker’s code] and successfully execute it after execution of mmap syscall.” ⁹ At least on Linux, it is unsafe to rely on ldd - especially if you are conducting reverse engineering or security research.

This is not a universal problem for this category of tooling. For example, macOS' otool when combined with the -L argument provides similar information as ldd without suffering from security issues. ¹⁰

Getting back to the objdump output from earlier - we see that the program depends on a library named libc.so.6. The ldconfig program can be used to locate the corresponding library file path. This program manages a cache that maps shared library names to their corresponding file paths. The -p argument tells ldconfig to write the cache’s entries to stdout. ¹¹:

$ ldconfig -p | grep libc.so.6
  libc.so.6 (libc6,x86-64, OS ABI: Linux 3.2.0) => /lib/x86_64-linux-gnu/libc.so.6

The entry for libc.so.6 points to the file /lib/x86_64-linux-gnu/libc.so.6. So… what C library is this, and where did it come from? Let’s ask the package manager:

# On Debian-based systems:
$ dpkg -S /lib/x86_64-linux-gnu/libc.so.6
libc6:amd64: /lib/x86_64-linux-gnu/libc.so.6
$ apt-cache show libc6:amd64
Package: libc6
Architecture: amd64
Version: 2.31-0ubuntu9.2
# ...

# On RedHat-based systems:
$ yum whatprovides /lib64/libc.so.6
glibc-2.28-151.el8.x86_64 : The GNU libc libraries
Repo        : @System
Matched from:
Filename    : /lib64/libc.so.6
# ...

Based on the information from the package manager, we now know that the program will use glibc. As seen above, the version of glibc can differ between operating systems. Such a difference signifies a separate compilation, meaning it is highly unlikely that the address of code in one version will be equivalent in another. This also applies to sub-distributions of major Linux distributions. For example, the source code used in Debian and Kali glibc packages could be the same revision, but the packaged library binary might be different compilations. This can happen due to packaging maintainers trying to isolate their software supply chain, or package maintainers backporting changes resulting in a unique glibc build.

All of this is important to keep in mind when exploiting a program on a different computer. If your exploit relies on glibc and you do not know the glibc version on the target computer, then you cannot assume your exploit will work.

Finding glibc in memory with `/proc/<pid>/maps`

Now that we can determine if a program depends on glibc and locate the local library file, we can move on to finding where glibc is loaded in memory at runtime. There are a few ways to do this. The /proc pseudo-filesystem is likely the quickest way to accomplish this. Inside of /proc/<pid>/ exists a file named “maps” which specifies all memory-mapped regions. Go ahead and execute the example program from earlier in the background. After doing so, retrieve the contents of /proc/<pid>/maps:

# Note: See "man 5 proc" for file format details.
$ ./loop &
[1] 28610
$ cat /proc/28610/maps
55a388da8000-55a388da9000 r--p 00000000 00:18 28373   /tmp/loop
55a388da9000-55a388daa000 r-xp 00001000 00:18 28373   /tmp/loop
55a388daa000-55a388dab000 r--p 00002000 00:18 28373   /tmp/loop
55a388dab000-55a388dac000 r--p 00002000 00:18 28373   /tmp/loop
55a388dac000-55a388dad000 rw-p 00003000 00:18 28373   /tmp/loop
7f483382f000-7f4833831000 rw-p 00000000 00:00 0
7f4833831000-7f4833857000 r--p 00000000 00:18 44177   /usr/lib/x86_64-linux-gnu/libc-2.32.so
7f4833857000-7f48339a0000 r-xp 00026000 00:18 44177   /usr/lib/x86_64-linux-gnu/libc-2.32.so
7f48339a0000-7f48339eb000 r--p 0016f000 00:18 44177   /usr/lib/x86_64-linux-gnu/libc-2.32.so
7f48339eb000-7f48339ec000 ---p 001ba000 00:18 44177   /usr/lib/x86_64-linux-gnu/libc-2.32.so
7f48339ec000-7f48339ef000 r--p 001ba000 00:18 44177   /usr/lib/x86_64-linux-gnu/libc-2.32.so
7f48339ef000-7f48339f2000 rw-p 001bd000 00:18 44177   /usr/lib/x86_64-linux-gnu/libc-2.32.so
7f48339f2000-7f48339f8000 rw-p 00000000 00:00 0
7f4833a0d000-7f4833a0e000 r--p 00000000 00:18 44172   /usr/lib/x86_64-linux-gnu/ld-2.32.so
7f4833a0e000-7f4833a2e000 r-xp 00001000 00:18 44172   /usr/lib/x86_64-linux-gnu/ld-2.32.so
7f4833a2e000-7f4833a37000 r--p 00021000 00:18 44172   /usr/lib/x86_64-linux-gnu/ld-2.32.so
7f4833a37000-7f4833a38000 r--p 00029000 00:18 44172   /usr/lib/x86_64-linux-gnu/ld-2.32.so
7f4833a38000-7f4833a3a000 rw-p 0002a000 00:18 44172   /usr/lib/x86_64-linux-gnu/ld-2.32.so
7ffe0dfcc000-7ffe0dfed000 rw-p 00000000 00:00 0       [stack]
7ffe0dff5000-7ffe0dff9000 r--p 00000000 00:00 0       [vvar]
7ffe0dff9000-7ffe0dffb000 r-xp 00000000 00:00 0       [vdso]

Each line represents a mapped memory region. Or, in other words, some data and its mapped location in memory. For the purposes of this example, we are interested in the first, second, and last columns of this file. Let’s take a look at the first line of the file and break it down:

# mapped-address-range    perm offset   dev   inode   file-path
55a388da8000-55a388da9000 r--p 00000000 00:18 28373   /tmp/loop

The first column denotes the mapped address range. The first address is the start (base) address (0x55a388da8000). The second address (0x55a388da9000) is the end address.

The second column (r--p) indicates the access permissions of the mapped region in the context of the process. For example, whether or not the process can modify the memory region, or execute any data in the region. In this case, the region is read-only. We will take a look at this in more detail in a subsequent section.

The last column (/tmp/loop) is the file path or name of the mapped region. Here, this region is a chunk of the executable file itself.

For more information about the maps file format, refer to section five of the manual for proc using man 5 proc. ¹²

Looking at the file paths column, we can find glibc’s region by looking for the first mention of its file path:

7f4833831000-7f4833857000 r--p 00000000 00:18 44177 /usr/lib/x86_64-linux-gnu/libc-2.32.so

The start (or base) address of glibc is 0x7f4833831000 in this case.

Finding glibc in memory with `gdb`

We can also use the GNU Debugger (gdb) to obtain similar information, minus permissions and inode data. This is convenient if we are already debugging a program. We can attach gdb to the example program by running:

gdb -p <pid>

At the gdb prompt, running the info proc mappings will dump the memory region mappings in a format similar to the maps file:

(gdb) info proc mappings
process 28610
Mapped address spaces:

    Start Addr           End Addr       Size     Offset objfile
0x55a388da8000     0x55a388da9000     0x1000        0x0 /tmp/loop
0x55a388da9000     0x55a388daa000     0x1000     0x1000 /tmp/loop
0x55a388daa000     0x55a388dab000     0x1000     0x2000 /tmp/loop
0x55a388dab000     0x55a388dac000     0x1000     0x2000 /tmp/loop
0x55a388dac000     0x55a388dad000     0x1000     0x3000 /tmp/loop
0x7f483382f000     0x7f4833831000     0x2000        0x0
# The following lines represent glibc's memory mapping:
0x7f4833831000     0x7f4833857000    0x26000        0x0 /usr/lib/x86_64-linux-gnu/libc-2.32.so
0x7f4833857000     0x7f48339a0000   0x149000    0x26000 /usr/lib/x86_64-linux-gnu/libc-2.32.so
0x7f48339a0000     0x7f48339eb000    0x4b000   0x16f000 /usr/lib/x86_64-linux-gnu/libc-2.32.so
0x7f48339eb000     0x7f48339ec000     0x1000   0x1ba000 /usr/lib/x86_64-linux-gnu/libc-2.32.so
0x7f48339ec000     0x7f48339ef000     0x3000   0x1ba000 /usr/lib/x86_64-linux-gnu/libc-2.32.so
0x7f48339ef000     0x7f48339f2000     0x3000   0x1bd000 /usr/lib/x86_64-linux-gnu/libc-2.32.so
0x7f48339f2000     0x7f48339f8000     0x6000        0x0
0x7f4833a0d000     0x7f4833a0e000     0x1000        0x0 /usr/lib/x86_64-linux-gnu/ld-2.32.so
0x7f4833a0e000     0x7f4833a2e000    0x20000     0x1000 /usr/lib/x86_64-linux-gnu/ld-2.32.so
0x7f4833a2e000     0x7f4833a37000     0x9000    0x21000 /usr/lib/x86_64-linux-gnu/ld-2.32.so
0x7f4833a37000     0x7f4833a38000     0x1000    0x29000 /usr/lib/x86_64-linux-gnu/ld-2.32.so
0x7f4833a38000     0x7f4833a3a000     0x2000    0x2a000 /usr/lib/x86_64-linux-gnu/ld-2.32.so
0x7ffe0dfcc000     0x7ffe0dfed000    0x21000        0x0 [stack]
0x7ffe0dff5000     0x7ffe0dff9000     0x4000        0x0 [vvar]
0x7ffe0dff9000     0x7ffe0dffb000     0x2000        0x0 [vdso]

Multiple memory regions for a single glibc

You might be wondering why there are multiple entries for glibc in the maps file and in the above gdb output. The reason for this is because certain parts of the library code need different access permissions. For example, glibc has global variables that programs can modify at runtime. To permit this, the memory region containing those variables must be writable.

This can be seen in gdb by looking up a glibc global variable such as the venerable __malloc_hook variable. ¹³ There are (at least) two ways to get the address of a symbol in gdb: info address <symbol> and p &<symbol>:

(gdb) info address __malloc_hook
Symbol "__malloc_hook" is static storage at address 0x7f48339efb90.
(gdb) p &__malloc_hook
$4 = (void *(**)(size_t, const void *)) 0x7f48339efb90 <__malloc_hook>

Looking back at the output from maps, we can see that this address (0x7f48339efb90) falls in the last glibc region. This is marked as readable and writable, denoted by the r (readable) and w (writable):

7f48339ef000-7f48339f2000 rw-p 001bd000 00:18 44177   /usr/lib/x86_64-linux-gnu/libc-2.32.so

Summary

In this post, we reviewed the purpose of glibc and its relationship with a basic C program and the operating system. We also learned:

Why hackers may find glibc useful when developing an exploit
That the entirety of a dynamically linked library’s code is usable by a dependent process at runtime
That ldd is unsafe and ldconfig combined with objdump can make for a safer alternative
How to locate glibc on the file system and in memory

In part two, we will take a closer look at the relationship between a program and glibc, specifically the challenges involved in locating and fingerprinting glibc at runtime.

References

wikipedia.org. Accessed: 2022, January 9. “Application binary interface”. ↩︎
wikipedia.org. Accessed: 2022, January 9. “Calling convention”. ↩︎
wikipedia.org. Accessed: 2022, January 9. “Undefined behavior”. ↩︎
man7.org. 2021, February 6. “objdump(1) - Linux manual page”. ↩︎
man7.org. 2021, February 6. “readelf(1) - Linux manual page”. ↩︎
Tool Interface Standard Committee. 1993, October. “Tool Interface Standard (TIS) Portable Formats Specification Version 1.1”. ↩︎
man7.org. 2021, August 27. “ldd(1) - Linux manual page”. ↩︎
Krumins, Peter. 2009, October 26. “ldd arbitrary code execution”. ↩︎
Smith, Ilya. 2018, February 16. “(CVE-2019-1010023) - ldd should protect against programs whose segments overlap with the loader itself”. ↩︎
opensource.apple.com. 2017, June 22. “otool-classic.1”. ↩︎
man7.org. 2021, March 22. “ldconfig(8) - Linux manual page”. Note: Using ldconfig with objdump was suggested by Thomas Strömberg. Thank you, Thomas. ↩︎
man7.org. 2021, August 27. “proc(5) - Linux manual page”. ↩︎
man7.org. 2021, March 22. “malloc_hook(3) - Linux manual page”. ↩︎

2022-08-16

control.rip

Leveraging glibc in exploitation - Part 1: What is glibc?

Table of contents

Special thanks and dedication

Overview

What is the C standard library?

When glibc becomes relevant to hacking

Finding the C library on the filesystem

Finding glibc in memory with `/proc/<pid>/maps`

Finding glibc in memory with `gdb`

Multiple memory regions for a single glibc

Summary

References

Table of contents

Special thanks and dedication

Overview

What is the C standard library?

When glibc becomes relevant to hacking

Finding the C library on the filesystem

Finding glibc in memory with /proc/<pid>/maps

Finding glibc in memory with gdb

Multiple memory regions for a single glibc

Summary

References

Finding glibc in memory with `/proc/<pid>/maps`

Finding glibc in memory with `gdb`