Here is a selective list of research projects that I worked on my spare time (or as part of my work).
Please leave a comment below or preferably e-mail me if you have any questions regarding the projects.
Decompiler (work in progress)
The initial goal of this project was to improve the current state-of-the-art in ARM decompilation (Hex-rays) by performing extra program analysis with Hex-rays SDK. However, as the project progressed, the focus has been shifted towards designing and implementing a generic decompiler engine/framework by leveraging LLVM components. The first supported target will still be ARM.
No details available, yet.
Software-based Remote Attestation
Nothing, yet.Show details
Poor Man’s Control Flow Integrity
This research was funded by DARPA as part of Cyber Fast Track (CFT) program in order to develop a practical implementation of Control Flow Integrity (CFI) to secure native code. The original CFI technique was proposed in the seminal paper from Microsoft Research back in 2005. This theoretical approach, while it could eliminate a critical bug class (control flow hijacking), had impractical overhead and limitations. This is because the CFI is enforced by constructing a complete Control-Flow Graph (CFG) and validating the node/edge every time the indirect branch (including return) happens. In our research, we relax a few conditions (hence the name) to reduce the overhead while still preserving strong security guarantees. Our benchmark shows about 6-8% performance (run-time) overhead.
This was presented at SECUINSIDE Conference in 2012.Show details
How is Poor Man’s CFI (PMCFI) different from traditional CFI?
- does not require a generation of control-flow graph (CFG) – EASY
- does not use memory store (for tags) – LIGHT
- only validates procedural (function call/ret) integrity – FAST
- guarantees similar CFI security – SECURE
- is implemented via both recompilation (compiler modification) and run-time instrumentation – APPLICABLE
Underlying concepts are simple.
We use the minimal set of instructions for call/ret target validation to achieve small cache pressure as well as less execution cycles. In order to significantly reduce the number of ROP gadgets, we generate distinct CALL_TAG and RET_TAG that are very different. To be more specific, we generate large (32-bit) unique tags that do not occur anywhere in the code that is generated by compilers (See below).
Every function in the program starts with CALL_TAG, and RET_TAG is immediately followed by every call-site.
Finally, the verification routine is inserted prior to any control transfer:
- before any indirect call or jmp
- before any return
Tags are two 2-byte instructions that do not modify the system state (i.e. semantically nop).
- CALL_TAG: mov esi, esi; mov edx, edx # 0xd289f689
- RET_TAG: mov esi, esi; mov ecx, ecx # 0xc989f689
Note that there is no verification routine emitted for any direct calls (e.g. 0x080485D6 call printf).
Blue box indicates the CALL_TAG and the orange box represents the RET_TAG.
Verification routine validates the tag before transferring the control.
- Before any indirect call or jmp:
1234mov eax, PTR # move target addresscmp dword ptr [eax], 0xd289f689 # verify CALL_TAGjnz __magic_failed # abort, if failedcall dword ptr PTR # jump to target
- Before any return:
1234mov ecx, [esp] # move return addresscmp dword ptr [ecx], 0xc989f689 # verify RET_TAGjnz __magic_failed # abort, if failedretn # return
For the proof-of-concept (PoC), we used x86 architecture with Linux programs that are compiled from C/C++ (using gcc and g++, respectively).
The following is the chart that compares the performance overhead among different CFI implementations (MSR_CFI, Lehigh_DS, MSR_ARCH, IBMON), taking the worst and the best results for each experiment on SPEC 2000.
While we got a great performance overhead reduction, we need to discuss what the security implications are for PMCFI.
So, are we safe now?
– …well, we are definitely “safer” ;)
Control flow hijacking attacks are mostly mitigated:
- Can’t jump into the middle of functions (No ROP gadgets, at least in traditional sense)
- Can’t jump/call into arbitrary code (No JOP gadgets / function pointer overwrites)
However, this is not a panacea!
- Note that this scheme still allows an attacker to call/jump to any valid function and return to any call-site.
- With well-devised payload and environment, it may be possible to construct a very limited set of ROP gadgets (since the payload must respect the tag consistency).
Hybrid Fuzz Testing: Discovering Software Bugs via Fuzzing and Symbolic Execution
Nothing, yet.Show details
This research aims to show that generating a single binary payload which runs on multiple architectures without any modification is possible. It is believed to be inherently difficult to create such payload due to diverse instruction set designs and their encoding schemes. Techniques to solve this challenge are developed and an infrastructure for automatically generating platform-independent programs is built.
This was published in ACM CCS’10 (pdf)Show details
We define key terms as follows:
- Platform: A hardware or emulated architecture; and/or an operating system.
- Program: A binary string/blob payload that is decoded to a valid and meaningful set of instructions for a platform.
By extension, a platform-independent program (PIP) is a program that run on multiple (2 or more) platforms in a meaningful way without any modification.
Furthermore, we define PIP Generation Problem and introduce our solution.
- PIP Generation Problem
Given a list of n program-machine pairs (bi, mj), we automatically generate a single program b’ such that ∀(bi,mj): mj(bi) = mj(b’). In other words, the PIP Generation problem takes in a list of programs, and outputs a single PIP that will execute on all of specified architectures. The result of executing the single PIP depends on the running architecture. This means that the PIP Generation problem allows for both cases where the final program bpip (b’ above) has the same semantics on all architectures, as well as different/independent semantics based on the architecture that bpip executes upon.
- PIP Generation Solution
We introduce a concept of the gadget*, which is the binary payload which is valid for multiple platforms but the behavior/semantics is dependent on the platform. By merging multiple gadgets, we can generate a program that is platform-independent with desired behaviors. We also show that it is possible to reduce each gadget to be an (semantically) instruction-level payload, which implies that we can build a compiler for the new language for platform-independent programs. Please refer to the paper for details on gadget structure.
*This is different from commonly known gadgets from Return Oriented Programming (ROP).
There are several security-critical implications of our techniques and implementation:
- Execution-based Steganography – A payload that acts differently based on the running architecture or operating system can be devised. Even with behavioral analysis on binary programs for malware detection, it is difficult to find malicious behaviors embedded in for the architecture that the analyzer does not perform analysis on.
- Malware bypassing Signature-based IDS – Similar to the problem mentioned above, a static signature-based detection is difficult since it may not be triggered as malware on certain platform. Even if the platform-independent malware signature is registered, with the tool (and conceptually) it is trivial to generate another PIP that looks very different in code/data while keeping the semantics the same.
- Platform-independent Shellcode – While most modern operating systems (and executable formats) verify that the program is suitable for running on the current platform/OS, shellcode payload used in exploits is not prevented from running. Therefore, it is possible to craft a shellcode that is targeted multiple platforms via PIP generation.
Here are some diagrams that depict our PIP concept:
As shown in Fig.1, the basic concept of PIP gadget is simple. We generate a binary string that is decoded as zero or more (semantic) nop instructions followed by a branch instruction for all of the targeted platforms. This allows platform-specific logic to be placed at certain offset. Note that, sometimes, two conditional branch instructions that are mutually exclusive condition (meaning that only one of them will get executed) can be used (See ARM decode).
To dive a little bit deeper into implementation details, we can separate the PIP generation into 2 phases — and 4 tasks, depending on what we want to do.
In pre-computation phase, we perform two tasks: Admissible PI Generation and PI Translation. Admissible PI Generation produces a list of gadget headers for given ISAs. PI Translation generates an instruction mapping among different ISAs, such that given a program in one platform can be translated to a program for another platform.
In the next phase, we can generate platform-independent programs in a couple of different ways. The first method is to utilize the gadget headers we produced in pre-computation phase. By placing the platform-specific programs to the correct offset determined by the gadget header, we can generate one or more gadgets necessary to produce a wholesome program. Similarly, we can translate the given platform-specific program for targeted platforms and generate gadgets from them. After all of the gadgets are produced, we finally merge and fix offsets to produce the PIP.
Fig.3 shows an example of a single PIP gadget with 3 platforms (x86, MIPS, and ARM) with each different logic.
Fig.4 represents multi-gadget chaining with two platforms trying to achieve the same semantics (using translation) through (semantically) instruction-level gadgets.