Exploits, ASLR and randomness

When trying to bypass DEP, I often use a heap spray to get data (including my shellcode) in a predictable location first. Next, I use ret-into-libc to call VirtualProtect in an attempt to give the chunk of the heap that contains my shellcode “RWE” permissions. Finally, I returning to my shellcode, which can now be executed without causing an exception. However, if ASLR is enabled, you must first bypass that to find out where VirtualProtect is located in memory.

I haven’t got as much time as I used to to write exploits for bugs I find, but when I write exploits, I usually do so in incremental steps: I first create a simple version that ignores ASLR/DEP and I make sure that works with ASLR/DEP disabled. I then add the code that uses ret-into-libc to bypass DEP, and provide it with the exact location of VirtualProtect to make sure that works as well before I add the code that automatically determines the location of VirtualProtect to bypass ASLR. Because I have ASLR enabled on most of my systems, I created a simple tool to extract its current location:


C:\Sample>type vp.c
#define WINVER 0x0500
#define _WIN32_WINNT 0x0500
#include <windows.h>
int main(int argc, char** argv) {
HMODULE hModule = 0;
FARPROC pFunction = 0;
if (argc < 2 || argc > 3) {
printf("Usage:\r\n %s module_name [function_name]\r\n", argv[0]);
} else {
hModule = LoadLibraryEx(argv[1], NULL, DONT_RESOLVE_DLL_REFERENCES);
if (!hModule) {
printf("Module not found!\r\n");
} else {
printf("Module base : %08X\r\n", (UINT)hModule);
if (argc == 3) {
pFunction = GetProcAddress(hModule, argv[2]);
if (!pFunction) {
printf("Function not found!\r\n");
} else {
printf("Function offset : %+8X\r\n", (UINT)pFunction - (UINT)hModule);
}
}
}
}
}
 
C:\Sample>build
== Sample ==
@ Generating build configuration.
@ Version 0.1 alpha, build 1, started at Fri, 03 Sep 2010 07:55:45 (UTC)
[Sample]
+ Build: vp.obj
+ Build: vp.exe
- Cleanup: vp.ilk
- Cleanup: vp.obj
@ Project built successfully.
@ Build successful.
 
C:\Sample>vp %SystemRoot%\system32\kernel32.dll
Module base : 77000000
 
C:\Sample>vp %SystemRoot%\system32\kernel32.dll VirtualProtect
Module base : 77000000
Function offset : 134EC
 
C:\Sample>
 

As you can see, both times I ran the tool, the base address of kernel32.dll was the same. This is because ASLR is only re-randomized at boot time, so until you reboot your machine, you can hard-code the value obtained this way into your exploit.

So, how random is the base address of kernel32.dll in real life? One way to find out is to set up a Windows machine to automatically run a script at startup that extracts the base address of kernel32.dll using the code above and then reboots. If you let this run for a while, you get a number of different values. Here’s a script I created to do just that:


@ECHO OFF
vp.exe "%SystemRoot%\system32\kernel32.dll" >> %COMPUTERNAME%.txt
IF EXIST continue.txt (
shutdown.exe -r -t 0
)
 

In addition to logging the base address of kernel32.dll in a file named after the machine it is running on, and rebooting the machine, it also checks for the existence of a file called “continue.txt”. That way, I can stop the machine from continuously rebooting by deleting that file (the script is loaded of a network share, so I can access the file from another machine). I used the “CONTROL USERPASSWORDS2″ configuration panel to tell Windows to automatically log in as a local user account at startup, and put the script in the “startup” folder of that local user.

After running for a while on a 32-bit Vista sp2 en-us virtual machine, I used the following Python script to extract some useful data from the information I gathered:


if __name__ == "__main__":
import sys;
file = open(sys.argv[1], 'rb');
try:
data = file.read();
finally:
file.close();
base_addresses_counts = {};
results_count = 0;
for line in data.split('\r\n'):
if not line:
continue;
results_count += 1;
base_address = int(line[18:], 16);
if base_address not in base_addresses_counts:
base_addresses_counts[base_address] = 1;
else:
base_addresses_counts[base_address] += 1;
base_addresses = base_addresses_counts.keys();
base_addresses.sort();
lowest_base_address = base_addresses[0];
highest_base_address = base_addresses[-1];
smallest_delta = highest_base_address - lowest_base_address;
previous_base_address = None;
print ' Base | Offset | Delta | Count ';
print '-------------|-------------|-------------|--------------';
for base_address in base_addresses:
offset = base_address - lowest_base_address;
if previous_base_address is not None:
delta = base_address - previous_base_address;
if delta < smallest_delta:
smallest_delta = delta;
else:
delta = 0;
print ' %11s | %11s | %11s | %d' % ( \
'0x%08X' % base_address, '+0x%X' % offset, '+0x%X' % delta, \
base_addresses_counts[base_address]);
previous_base_address = base_address;
print '-------------\'-------------\'-------------\'--------------';
print ' Total runs: %d' % results_count;
print ' Total different values: %d' % len(base_addresses);
print ' Smallest delta: 0x%X' % smallest_delta;
print ' Total possible values: >= %(v)d (%(v)X)' % {'v': offset / smallest_delta};
 

Here’s part of the output of this script:

C:\Sample>analyze.py VM3-V32SP2-N.txt
Base | Offset | Delta | Count
-------------|-------------|-------------|--------------
0x75490000 | +0x0 | +0x0 | 1
0x75550000 | +0xC0000 | +0xC0000 | 1
0x75580000 | +0xF0000 | +0x30000 | 1
0x755A0000 | +0x110000 | +0x20000 | 2
<snip>
0x77E40000 | +0x29B0000 | +0x40000 | 1
0x77E80000 | +0x29F0000 | +0x40000 | 1
0x77EB0000 | +0x2A20000 | +0x30000 | 1
0x77ED0000 | +0x2A40000 | +0x20000 | 2
-------------'-------------'-------------'--------------
Total runs: 807
Total different values: 460
Smallest delta: 0x10000
Total possible values: >= 676 (2A4)
 
C:\Sample>

To clarify: the machine was rebooted to collect another base address 806 times, yielding 807 base addresses. The base addresses were distributed among 460 different values, some values occurring more than once. Because of the number of the tests I performed and the randomness at which the addresses get chosen, it is to be expected that some values occur more often than others and that some values do not occur at all. Based on the lowest and highest value (07549000 and 077ED0000) and the smallest difference between two addresses (10000), I calculate that there are at least 676 different possible values for the base address.

I was a bit surprised by the results. I haven’t kept up-to-date with ASLR randomness, but IIRC it was 8-bits (256 possible values) last time I checked. Microsoft appears to have increased the randomness of their ASLR implementation in Vista. This makes a brute force attack against ASLR, in which you try all possible values until you find the right one, take longer. This also decreases the chances of success for an attacker that only has one try at guessing the address: a 1/256 chance is bad, a 1/676 chance is worse.

Should you decide to run a similar test, let me know what OS you tested and what values you found!

6 Comments to “Exploits, ASLR and randomness”

  1. Jon
    2010/09/03

    For EXE files, there are 256 possible base addresses. It’s different for DLLs. There’s a description of the ASLR process for DLL files in Windows Internals – see here: http://my.safaribooksonline.com/9780735625303/759.

    The address of the first loaded DLL (ntdll.dll) is going to be one of 256 values, and each subsequent DLL loaded will be based on that address. The reason you see more than 256 addresses for the DLL you’re checking is because the load order is also randomized.

    If you check ntdll.dll instead of kernel32.dll, you’ll probably find that there are only 256 possible addresses it can be loaded at.

  2. Dinos
    2010/09/04

    Hi, on windows 7

    Total runs: 1234
    Total different values: 533
    Smallest delta: 0×10000
    Total possible values: >= 714 (2CA)

    and for better comparison with your results,

    Total runs: 807
    Total different values: 444
    Smallest delta: 0×10000
    Total possible values: >= 714 (2CA)

  3. SkyLined
    2010/09/05

    Thanks Jon, I ran another test and collected the base addresses of kernel32 and ntdll each time. Here are the results:


    ======================== kernel32 ======================
    Base | Offset | Delta | Count
    -------------|-------------|-------------|--------------
    0x75460000 | +0x0 | +0x0 | 1
    0x75480000 | +0x20000 | +0x20000 | 2
    0x75490000 | +0x30000 | +0x10000 | 1
    <snip>
    0x77E80000 | +0x2A20000 | +0x20000 | 1
    0x77EC0000 | +0x2A60000 | +0x40000 | 2
    0x77EE0000 | +0x2A80000 | +0x20000 | 1
    -------------'-------------'-------------'--------------
    Total runs: 2109
    Total different values: 598
    Smallest delta: 0x10000
    Total possible values: >= 680 (2A8)
     
    ========================= ntdll ========================
    Base | Offset | Delta | Count
    -------------|-------------|-------------|--------------
    0x76DB0000 | +0x0 | +0x0 | 14
    0x76DC0000 | +0x10000 | +0x10000 | 12
    0x76DD0000 | +0x20000 | +0x10000 | 11
    <snip>
    0x77D80000 | +0xFD0000 | +0x10000 | 8
    0x77D90000 | +0xFE0000 | +0x10000 | 4
    0x77ED0000 | +0x1120000 | +0x140000 | 9
    -------------'-------------'-------------'--------------
    Total runs: 2109
    Total different values: 256
    Smallest delta: 0x10000
    Total possible values: >= 274 (112)
     

    As expected, ntdll has 256 possible locations, each one 0×10000 bytes offset from the previous, except the last one. I have no idea why the last one is offset by more than is to be expected. Either way, if you are doing a brute force attack against ASLR using, it makes more sense to use ntdll functions rather than kernel32. In the above case of an attack against ASLR/DEP, you will want to use ntdll!ZwProtectVirtualMemory rather than kernel32!VirtualProtect.

  4. Dinos
    2010/09/16

    So is it possible if there is no non-aslr module in the system to build a shellcode to detect ntdll.dll location and take it from there … ?

  5. SkyLined
    2010/09/16

    @Dinos: Why would you want to: once you are executing shellcode you can do whatever you want?

    The problem is that you need to break a circular dependency: you want to run shellcode, but DEP makes this difficult. To bypass DEP you want to use ret-into-libc, but ASLR makes this difficult. If you want to bypass ASLR by executing shellcode, you are creating a loop: shellcode comes after DEP, DEP comes after ASLR and ASLR comes after shellcode. You have to find a way to break the loop. Guessing ASLR addresses is one way, assuming you can guess well enough and/or often enough.

    You normally only get one chance: most applications will crash if you guess wrong. Some will restart once or twice, which means you have a slighly larger chance (but still slim). If you can find a way to restart the application an unlimited number of times, you can try different addresses until you guess correctly. More on that later :)

  6. Dinos
    2010/09/16

    yes, my thoughts was on guessing, so waiting for more ! :)

Leave a Comment

*

*