Wiki back up

In other news: I am looking for a better hosting provider, or at least one that does what I pay them to do, or at the very least what they promis.

Google Chrome 4

Google Chrome 4 has been released. It fixes a number of security issues, including some that I found:

Update: these bugs have had access their restrictions lifted and details can be found there.

Microsoft Internet Explorer 6.0/7.0 NULL pointer crashes

Two crashes caused by NULL pointer dereferences have been discovered in MSIE 6.0/7.0. These issues do not affect MSIE 8.0.

I’ve recently started using Google Code for tracking bugs: an editted version of the history of this bug can be found here.

Advances in heap spraying #1: when size matters.

http://skypher.com/SkyLined/heap_spray/small_heap_spray_generator.html

I’ve created a heap-spray generator. It generates a small piece of JavaScript that sprays the heap using the following customizable settings:


  • Shellcode, easy to enter using hexadecimal byte values (see also BETA3).

  • Target address and block size.

  • heap header size based on target browsers or manual value.


The resulting code is smaller than any heap-spray I’ve seen in the wild:

  • The heap-spray code itself is just over 70 bytes.

  • The shellcode can be encoded using a custom-build 7-bit encoding.


Most exploits contain shellcode encoded as “\uXXXX” or even “%uXXXX”. The resulting encoded shellcode data contains 3 bytes for every byte in the original shellcode. Because this is very wasteful, it is quite easy to improve on this by creating a custom en-/decoder. The “7-bit” encoding I created converts the 16-bit characters in the unicode string that contains the shellcode to a series of 7-bit values, which are encoded into latin-1 characters. The resulting encoded shellcode data contains only 1.125 bytes for every byte in the shellcode, a saving of almost 63% compared to conventional encodings.
The heap-spray will of course need some additional code to decode the shellcode, so the combined code+data will only be smaller for large enough shellcodes. Because my decoder is also rather small (just under 130 bytes), the break-even point is just under 70 bytes of shellcode. For a a 100 byte shellcode, you save about 50 bytes and for a 200 bytes shellcode, you save about 200 bytes!

You can try out the heap-spray generator here.

Wiki temporarily down

Unfortunately, our wiki is broken. Due to having a social life, it may be a while before we can restore functionality. :(

w32-exec-calc-shellcode released

I’ve released the source for a 97 byte shellcode that executes calc.exe.

Download and LoadLibrary shellcode released

Everyone and their dog seems to want to use download and execute shellcode in their exploits. Even though this has some drawbacks:

  • You need to create an .exe file on the system, which will very likely draw unwanted attention.
  • You cannot use an API that downloads your file to a temporary location, because that will likely not retain the .exe extention.
  • You need to make an assumption about where a safe place is to write your .exe file, which means you can guess wrong and the code fails.
  • You need to store the string ‘.exe’ in the download & execute shellcode, which means this is 4 bytes larger.
  • You need to spawn an extra process, which will very likely draw attention.
  • You leave cleaning up the exploited process to the download & execute shellcode, which means this needs to be larger.

To get around these problems, I created download and LoadLibrary shellcode: a shellcode that will download a DLL file to a temporary file and load it into the exploited process using LoadLibrary. The benefits of this approach are:

  • Smaller code.
  • You can use the URLDownloadToCacheFileA API function in urlmon that downloads and saves your DLL to a temporary file, meaning you do not need to provide a location.
  • No need to create an .exe file on the system: the extention of a DLL is irrelevant.
  • No need to spawn an extra process.
  • You can clean up the exploited process from the code in the DLL instead of the shellcode.

The size of the final shellcode depends on the length of the URL for your DLL. For most recent version of the code it is 138 bytes + the length of the URL. This is a pretty decent reduction from the average download and execute shellcodes of 200+ bytes (excluding the URL) that I found around the interwebs.

Project homepage:
http://code.google.com/p/w32-dl-loadlib-shellcode/

Testival released

During shellcode development, it makes sense to have a program that can easily load your shellcode at a controlable location, allows you to set registers and memory to certain values and execute the shellcode by setting EIP through a RET or CALL instruction.

The Testival project aims to do all those things and more: it also allows you to test ret-into-libc attacks, set the type of memory allocation you want (RWE flags, etc…), report exceptions in your code to stdout as well as load DLLs to test shellcode in DllMain.

Testival is used by ALPHA3 for automatically testing if all the en-/decoders work.

Testival requires SkyBuild to automatically build all files.

ALPHA3 released

I realized that if I would wait until I had fully documented everything in ALPHA3, it would probably never get released. So, without further ado, documentation or explanations:

It has been developed and tested on Windows, but it should not be to hard to get it to run on other platforms. If you are having difficulty on other platforms and manage to create patches to fix this, please let me know and/or become a commiter to the project!

PS. My appologees for my lack of 1337 Python coding skills to whomever gets to port it to Metasploit – I did this project in Python while I was learning the language :)

Countslide alphanumeric GetPC

One limitation of most alphanumeric shellcode decoders, including those in ALPHA2 and the soon-to-be-released ALPHA3 is that they need to know where they are located in memory in order to decode themselves and run correctly. This makes using a nopslide hard in most circumstances, because you mostly only need a nopslide if you do not know exactly where your shellcode is in memory to begin with.

Countslide GetPC is a new technique that I developed to allow the use of nopslides and determine exactly where your shellcode is if you can roughly predict where it will be located in memory.

Given a range of addresses AminAmax in which you can predict your shellcode to start, we will calculate the average address Aavg and the maximum absolute deviation Dmax like so:
Aavg == (Amin + Amax) / 2
Dmax == (Amax – Amin) / 2
 

Using a nopslide of length Dmax * 2 starting at an address in this range and a return address of Aavg + Dmax will always cause the nopslide to get hit and thus the code at the end of the nopslide to get executed:























Aavg












Aavg – Dmax

Aavg + Dmax

























D = -Dmax

Nopslide code




O = 2 * Dmax


















D = X


Nopslide code



O = Dmax – X


















D = +Dmax





Nopslide code
O = 0







































Return address






















In this example, the actual deviation D from Aavg indicates where the exploit actually ends up jumping to. The base address of the nopslide Anop plus the offset in the nopslide where execution starts O are equal to the return address Aavg + Dmax:

Anop + O == Aavg + Dmax
 

Because Aavg and Dmax are values we predict, we can calculate the base address Anop of the nopslide if we can calculate O. And because we know the length of the nopslide is Dmax * 2, we can calculate the base address of the code that follows the nopslide Apatcher as well:

Anop == Aavg + Dmax – O
Apatcher == Aavg + Dmax * 3 – O
 

So, any address Aavg + Dmax * 3 + X will be in the code that follows the nopslide at offset O + X (if that code is large enough). We can choose to overwrite a byte at that address to modify the code following the nopslide. Which byte of the code gets modified depends entirely on the value of O. This means that the value of O can directly influence what our code does and this is what we use to calculate the value of O.

A small piece of code which I will call the patcher of length P is put after the nopslide followed by a second nopslide of length Dmax * 2 which I will call the countslide. When executed, the patcher overwrites a byte in the countslide at address Aavg + Dmax * 3 + P (the modification address), which is always inside the countslide. Here’s an example:






























Anop + Dmax * 2 + P








Anop

Anop + Dmax * 2


Anop + Dmax * 4 + P

























Nopslide patcher countslide




























Anop + O

Anop + O + P + Dmax * 2



























Aavg + Dmax


Aavg + Dmax * 3 + P



























Return address
Modification address
























The countslide will consist entirely of one byte INC ECX instructions. The patcher will overwrite one byte at the predictable address Aavg + Dmax * 3 + P with a one byte POP ECX instruction. It then stores the predictable value Aavg + Dmax * 3 + P + 1 on the stack after which the countslide is executed.

Here is what will happen after the exploit makes code jump to address Aavg + Dmax in the nopslide:

  • the nopslide executes until it reaches the patcher,
  • the patcher modifies the countslide at Aavg + Dmax * 3 + P,
  • the patcher saves the value Aavg + Dmax * 3 + P + 1 on the stack, after which the countslide is executed,
  • the countslide increments ECX over and over, acting like a normal nopslide, until it runs into the patched POP ECX,
  • the POP ECX instruction pops the value Aavg + Dmax * 3 + P + 1, saved there by the patcher, off the stack into ECX.
  • the countslide then continues to increment ECX for every one byte instruction it executes, until it reaches its end.

The number of INC ECX instructions executed in the countslide after the POP ECX Ninc depends on Dmax and O as follows:

Ninc == Dmax * 2 – O – 1
 

So, taking into account that the POP ECX sets ECX to Aavg + Dmax * 3 + P + 1, after the countslide has completely been executed, the value in ECX will be:

ECX == Aavg + Dmax * 3 + P + 1 + Ninc
ECX == Aavg + Dmax * 5 + P – O
 

And because Anop + O == Aavg + Dmax, this means the value in ECX is:

ECX == Anop + Dmax * 4 + P
 

Which, as you can see in the second diagram above, is exactly where our countslide ends, so at this point ECX == EIP. The countslide is followed by the shellcode, which can use ECX as the source of its base address.

*UPDATE*: ALPHA3 comes with a working version of Countslide mixedcase alphanumeric ascii GetPC for x86.