Google CTF 2018 – Tape

The description of this challenge is: “We’ve found this priceless, old magnetic tape in our archives. We dumped the contents, but the data is corrupted and we can’t read it. We only have this old tape reader tool, but source code was lost long ago. The program has only 944 bytes, so reversing it should be trivial, right?”

Playing around with the tape reader

We’re given two files: “FLAGZ.DAT”, which contains binary data (and is presumably the tape mentioned in the description), and “tapereader”, a stripped ELF file.

First, we can just directly run tapereader on FLAGZ.DAT and see what happens. We used QEMU because tapereader uses the ARM architecture; running cat FLAGZ.DAT | qemu-arm tapereader prints out the following:

(screenshotted to show color)

Well, this tape is certainly corrupted! There is also a message at the bottom saying that there were “checksum errors”:

This implies that the tapereader binary somehow computes checksums from the tape data, and that if we can fix all the errors in the tape to make all the checksums work out, then the flag should hopefully be readable. One interesting observation here is that once an error occurs somewhere, all of the characters below it in the column also have errors, which implies that fixing a character should also fix the rest of the column (at least up to the next error). Also, all of the text seems to be taken directly from the Wikipedia article on Error detection and correction so it seems like it’ll be pretty easy to determine the plaintext. Let’s take a look at how the data in FLAGZ.DAT is represented:

As we can see, the first 80 characters of this file exactly match the first line of the tape reader output (each line is 80 characters long). Later on in the file we can make out some words, but nothing else seems to match exactly. However, there are a bunch of 1a bytes shortly after the first line which seem to correspond to the spaces in the second line, which makes the 7f68687568 correspond to ERROR (and a bunch of other characters correspond to the remaining DETECTION AND CORRECTION). What transformation turns 1a to space and 7f68687568 to ERROR? One of the most obvious things to try is XORing, and it turns out that XORing all of these bytes with 3a gives the desired characters.

Also, 3a corresponds to the character :, which is interestingly the same character in the first line! From here, it’s not too hard to guess and verify that the tape reader simply XORs a given character in the file with all of the characters above it in the same column to make the displayed character. This also completely explains the “an error propagates down the column” behavior mentioned earlier. Some other notes:

  • There are 8 seemingly random unused bytes between each 80-byte line (e.g. the first one is 186f4c61f89e5280); we can assume these are the checksums that are being checked by the tape reader.
  • All of the corrupted bytes are replaced with FFs.

At this point in the challenge, we were feeling pretty good about ourselves. Now that we knew exactly how the tape data corresponded to the displayed characters, we could just turn all of the FFs into the bytes that will make the text match that on the Wikipedia page. There were only 190 such corrupted bytes, so we wrote a script to aid in manually displaying the corrupted bytes and entering the correct ones. We started patting ourselves on the back as we were certain we would get the flag without even needing to reverse the tape reader, when we reached the bottom and were punished for our hubris:

(zooming in on the corrupted bytes in the hexdump)

When we saw this, we realized that we’d been completely bamboozled: there was no way to use context clues to guess the missing characters in the goo.gl link or in the flag, and we couldn’t even use the characters below the missing characters to help because the next line has missing characters in all the same positions!

At this point, we spent a while trying to use various black box methods of guessing the characters because no one wanted to actually reverse the ELF file. We tried things like brute-forcing and checking the number of green characters in the result (assuming that green is generally good), or writing a script to check whether a certain goo.gl link went to the relevant Wikipedia page, but in the end the fact remained that brute-forcing 6-7 characters of the goo.gl link was really hard, and there was absolutly no way we could brute-force 10 bytes of the flag. Eventually, we decided to bite the bullet and start looking at the tape reader so we could figure out how the checksum was computed.

Reversing the checksum algorithm

The first and biggest obstacle was that none of us were familiar with the ARM architecture, so we spent quite a bit of time reading up on how it works. We found the documents here and here (the “Introduction to ARM Assembly” section) both quite helpful. If you’re not that familiar with ARM, it would probably be useful to read up on the basics before continuing, though I will try to explain the relevant parts as they come up.

Anyways, let’s open this up in IDA. Here’s the start of the file:

The very first instruction moves an odd address into the program counter, which changes execution from ARM to Thumb mode, which is basically a more compact instruction set where each instruction is 16 bits instead of ARM’s 32 bits. It seems like IDA can’t really tell which parts are Thumb and which parts are ARM, so you have to use the “change segment register value” command (keyboard shortcut Alt-G) to set it manually.

After that, we can pretty easily follow the next instructions, which loads values from the middle of the file into registers, eventually setting R5 to 0x202f0, an address in the middle of the BSS section, and putting it on the stack at SP+4. It then loads the values 0xd7870f42 and 0xc96c5795 into R6 and R7. After this comes a loop:

Some notes on what this does:

  • There are two loops. The outer loop is from 0x1008c to 0x100d0 and repeats 0x100=256 times.
  • The inner loop is from 0x10098 to 0x100b8. LR is set to 8 at 0x10094 and is decremented each iteration, so this loop repeats 8 times.
  • This inner loop essentially transforms the values in R2 and R3 by dividing them by 2 (LSR and RRX are both right shifts), and then XOR-ing them with the values in R6 and R7 depending on whether R2 is odd.
  • The STMIA instruction at 0x100cc consecutively stores the values in R2 and R3 in the 8 bytes starting at the address in R5 (which as established earlier is in the BSS section), and then increments R5 by 8.
  • Putting it all together, the whole thing basically loads a length-256 array, each element consisting of two 4-byte words, into the BSS. We’ll soon see what this array is used for.

After this loop comes the part of the code that reads in the tape file. We never ended up figuring out how this part worked because we had lots of trouble understanding ARM supervisor calls. However, we were able to skip it and just go the part of the program that was computing the checksum, at 0x10178 (thankfully, 944 bytes of code is a manageable amount so it wasn’t too hard to find):

Some notes on what this does:

  • R2 and R3 represent the current “checksum” of the line. Characters are read in one at a time, each of which modifies the checksum, until we reach the end of the line.
  • R12 stores the index of the current character in the line (as can be guessed from the CMP R12, #0x50 instruction; recall that there are 0x50=80 characters per line). This means that the LDRB R1, [R9, R12] instruction loads the current character into R1. We don’t know exactly what R9 is but we can assume from context that it’s pointing to an 80-character line of input.
  • Recall that SP+4 contains the BSS address of the 256 pairs of 4-byte words. This means that the instructions from 0x10188 to 0x10198 take the character from the line, XOR it with the last byte of R2, and then use the result to index into this length-256 array, storing the result in R6 and R7 (LDMIA is basically the inverse of STMIA). Note that LSL#3 multiplies by 8, the number of bytes in each element of the array, which is how the array indexing is done.
  • After that, the values in R2 and R3 are bit shifted in some way, and are XORed with the values in R6 and R7.
  • This “read character, XOR, look up in array, shift some bits, XOR result with intermediate checksum” process is repeated for each of the 80 characters in the line to get the final checksum. In the 0x101c8 and 0x101cc instructions, we can see this checksum compared to another address stored in SP+8, which is presumably where the 8 byte checksum in the flag file is stored.

Finally, this took us a while to notice, but the weird shifts in 0x10194-0x101a8 are a lot more understandable if, instead of considering R2 and R3 as two separate parts of a checksum, we actually consider it as a single 64-bit number, with R2 being the 4 low bytes and R3 being the 4 high bytes (basically, this program is implementing 64-bit integers in a 32-bit architecture). Below is a step-by-step visualization of this, where R2[3] through R2[0] represent the 4 bytes of R2 from most to least significant (same with R3), and R6 is shorthand for R6[3] R6[2] R6[1] R6[0] (same for R7).

start checksum = R3 R2 = R3[3] R3[2] R3[1] R3[0] R2[3] R2[2] R2[1] R2[0]

0x1019c       MOV R0, R2, LSR#8 (LSR#8 = shift right by 8 bits)
R0 = R2[3] R2[2] R2[1]
0x101a4       ORR R0, R0, R3, LSL#24 (shift left by 24 bits)
R0 = R3[0] R2[3] R2[2] R2[1] 
(note that registers are 32 bits, so R3 shifted left by 24 gives R3[0] 0 0 0)
0x101a8       MOV R1, R3, LSR#8
R1 = R3[3] R3[2] R3[1]
0x101b0       EOR R2, R6, R0
R2 = R0 ^ R6 = (R3[0] R2[3] R2[2] R2[1]) ^ R6 
0x101b4       EOR R3, R7, R1
R3 = R1 ^ R7 = (R3[3] R3[2] R3[1]) ^ R7

end checksum = R3 R2 = (R3[3] R3[2] R3[1] R3[0] R2[3] R2[2] R2[1]) ^ (R7 R6)

As we can see, the 64-bit checksum simply ends up being shifted 8 bits right and XOR-ed with another 64-bit integer. To sum it all up, here is the overall checksum algorithm:

  1. Initialize the 64-bit integer current_checksum to 0. This integer is represented by R3 in the high bytes and R2 in the low bytes.
  2. Read in a character.
  3. XOR this character with the last byte of current_checksum. Take the resulting number and index that into the array to read two 32-bit integers into R6 and R7. This can also be considered as a 64-bit integer with R7 in the high bytes and R6 in the low bytes, and let’s call this integer x.
  4. Shift current_checksum right 8 bits and XOR the result with x to get the new current_checksum.
  5. Repeat steps 2-4 for each of the 80 characters, the resulting current_checksum is the final checksum.

Similarly, the array generation above can also be viewed in terms of 64-bit integers. Here is the Python program we wrote to implement the array generation as well as the checksum algorithm (which we named crc):

Edited to add: as pointed out in the comments, this is actually just the CRC algorithm specified here, and the polynomial is the same as that of CRC-64-ECMA.

import binascii
import copy
import itertools
import struct
import sys

def to_str(l):
	return ''.join(map(chr, l))

def make_table():
	table = []
	r = 0xc96c5795d7870f42

	for r1 in range(256):
		k = r1
		for _ in range(8):
			r10 = k & 1
			k >>= 1
			if r10:
				k ^= r
		table.append(k)
	return table
table = make_table()

def crc(s):
	assert len(s) % 80 == 0
	ret = []
	s = [s[i:i+80] for i in range(0, len(s), 80)]
	ss = '\0' * 80
	for line in s:
		# XOR with previous characters in the same column
		ss = [chr(ord(a) ^ ord(b)) for a,b in zip(ss, line)]
		# Compute the checksum for the line
		current_checksum = 0
		for c in ss:
			r1 = ord(c)
			r1 ^= current_checksum
			r1 &= 0xff
			x = table[r1]
			current_checksum >>= 8
			current_checksum ^= x
		ret.append(struct.pack('<Q', k))
	return ''.join(ret)

if __name__ == '__main__':
	sys.stdout.write(crc(sys.stdin.read()))

We can verify that this works by running this on the first several lines of the corrected FLAGZ.DAT file and checking that the generated checksums exactly match those in the file. For example, the checksum of the first line, " .::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::. " is indeed “\x18\x6f\x4c\x61\xf8\x9e\x52\x80”, which matches the 8 bytes in FLAGZ.DAT.

A few more miscellaneous notes:

  • We had some trouble with getting the make_table function to produce the correct results. At first we tried to just run tapereader in GDB and grab the values from memory, but we couldn’t get GDB to work with both ARM and Thumb mode for some unknown reason (it just assumed everything was ARM by default, and there was an option that caused it to assume everything was Thumb, which didn’t work with the program constantly switching between the two).
  • Earlier, I kind of glossed over the instructions 0x100a0 MOVS R3, R3, LSR#1 and 0x100a4 MOV R2, R2, RRX. To explain this a little more, the MOVS instruction is the same as MOV except it sets flags, and RRX is a rotate right including the carry flag. When R3 is shifted right by 1 in the first instruction, the byte that is shifted out is therefore stored in the carry flag, which is then shifted into R2 by the second instruction. Basically, this is how they implemented shifting the 64-bit integer R3 R2 right 1 bit.
  • At first, we weren’t sure if the checksum was computed with the bytes directly in the tape file, or with the bytes after being XORed with the previous bytes in the same column. We just checked the checksum both ways, and it turned out to be the latter.

Figuring out the corrupted bytes

To recap, we now know exactly how the checksum algorithm works, and we want to fill in 7 missing characters of a line so that the resulting checksum matches some given checksum (and the same thing with 10 missing characters in the next line). Luckily, most of the hard work is behind us and the rest of it turns out to be a fairly straightforward math/crypto problem.

Because the checksum is computed one step at a time, one potential idea is using some meet-in-the-middle attack to brute-force the missing bytes. Let’s assume that the checksum function is reversible (i.e. right now, given a checksum and a character, we can compute the next checksum, so let’s assume that given the “next checksum” and a character, we can go backwards and compute the previous checksum). Then we could use the following strategy when there are 10 missing charcters:

  1. Iterate through all possible values for the first 5 missing characters, and for each one, compute the intermediate checksum right after the last of those characters.
  2. Iterate through possible values for the the last 5 missing characters, and for each one, compute the intermediate checksum right before the first of those characters. Assuming that the checksum function is reversible, we can do this by simply starting with the final checksum and moving backwards.
  3. Compute the intersection of the two sets generated in steps 1 and 2. If there’s only one checksum in both sets, it uniquely determines the first 5 and last 5 characters (assuming that there are no collisions), so we’re done. Also, there better only be one checksum in both sets, because otherwise the flag wouldn’t be unique.

This strategy would require computing and comparing around 2×625≈2 billion intermediate checksums, which is definitely feasible in a language like C++. So, all that’s left is figuring out how to step backwards in the checksum function. Let’s take a look at the checksum algorithm again, in pseudocode.

Inputs: intermediate checksum, character
next_checksum(int64 current_checksum, char c):
    c2 = last byte of current_checksum ^ c
    x = table[c2]
    return (current_checksum >> 8) ^ x

If we want to go backwards, then we need to determine what number was selected from the table. How can we determine that given the final checksum? One observation is that the first byte of the return value, (current_checksum >> 8) ^ x, is also the first byte of x. So we could implement the reverse function like this:

Inputs: intermediate checksum, character
previous_checksum(int64 current_checksum, char c):
    first_byte = first byte of current_checksum
    c2 = ?
    x = ?
    for i in range(256):
        if the first byte of table[i] == first_byte:
            c2 = i
            x = table[i]
            break
    last_byte = c2 ^ c
    return ((current_checksum ^ x) << 8) | last_byte 

Of course, this would only work if there were a unique table[i] for each of the 256 possible first_bytes, but luckily for us, this turns out to be the case. There is almost certainly some mathematical reason why the values generated by make_table all have distinct first bytes, because it would be too big of a coincidence otherwise, but we haven’t thought too much about it (and it’s not important to solving the challenge anyways).

Here is the script we wrote to implement the meet-in-the-middle attack, cleaned up a little and commented:

#include
using namespace std;

unsigned long long table[256];
char invtable[256];
unsigned long long t_r = 0xc96c5795d7870f42;

const long long SIZE = 1LL << 30;
unordered_set inters;

unsigned long long intermediate;

string valid = ":.0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";

// Contents of the goo.gl line before and after the missing 7 characters.
// string s1 = ": Anyway, this was interesting,  but if you want more,  go read goo.gl/";
// string s2 = " :"; 

// Contents of the flag line before and after the missing 10 characters.
// Note that we could only determine 7 bytes in s2 after figuring out the goo.gl line.
string s1 = ": You probably just want the flag.  So here it is: CTF{dZXi";
string s2 = "PIUTYMI}. :";

// Given an intermediate checksum k and next character c, returns the next checksum.
unsigned long long forward(unsigned long long k, char c){
	c ^= k;
	unsigned idx = c;
	idx &= 0xff;
	unsigned long long r = table[idx];
	k >>= 8;
	k ^= r;
	return k;
}

// Given an intermediate checksum k and previous character c, returns the previous checksum.
unsigned long long backward(unsigned long long k, char c){
	char v = invtable[k >> 56];
	unsigned idx = v;
	idx &= 0xff;
	k ^= table[idx];
	k <<= 8;
	k |= c ^ idx;
	return k;
}

char result[80];
// Given an intermediate checksum k1, moves forward through all length-len character
// sequences and adds all resulting checksums to inters.
void forwardscan(unsigned long long k1, int len){
	if(len == 0) {
		inters.insert(k1);
		return;
	}
	for(char c : valid) {
		result[len-1] = c;
		forwardscan(forward(k1, c), len-1);
		if(len == 5) printf("%s\n", result);  // print a status update
	}
}

// Given an intermediate checksum k2, moves backward through all length-len character
// sequences and checks whether any of the resulting checksums are in inters. If it
// successfully finds an intersection, stores it in "intermediate".
void backwardscan(unsigned long long k2, int len){
	if(len == 0) {
		if(inters.count(k2)){
			intermediate = k2;
			printf("Found %llx\n", k2);
		}
		return;
	}
	for(char c : valid) {
		result[len-1] = c;
		backwardscan(backward(k2, c), len-1);
		if(len == 5) printf("%s\n", result);  // print a status update
	}
}

// Prints all character sequences of length len that take a starting checksum k1 to an
// ending checksum k2.
void check(unsigned long long k1, unsigned long long k2, int len){
	if(len == 0) {
		if(k1 == k2) printf("answer: %s\n", result);
		return;
	}
	for(char c : valid) {
		result[len-1] = c;
		check(forward(k1, c), k2, len-1);
		if(len == 5) printf("%s\n", result);  // print a status update
	}
}

int main(){
	inters.reserve(SIZE);

	// Generate the table, as well as the inverse table of first bytes to indexes.
	for(unsigned i=0; i<256; ++i){
		unsigned long long k = i;
		for(int _i=0; _i>= 1;
			if(r10) k ^= t_r;
		}
		table[i] = k;
		invtable[k >> 56] = i;
	}

	// Move k1 forward through the characters in s1.
	unsigned long long k1 = 0;
	for(unsigned i=0; i<s1.length(); ++i){
		char c = s1[i];
		k1 = forward(k1, c);
	}

	// unsigned long long k2 = 0xa6e52257d54b39c5; // c5394bd55722e5a6, checksum for the goo.gl line
	unsigned long long k2 = 0x30d498cbfb871112; // 121187fbcb98d430, checksum for the CTF{...} line
	// Move k2 backward through the characters in s2.
	for(unsigned i=0; i<s2.length(); ++i){
		char c = s2[s2.length()-1-i];
		k2 = backward(k2, c);
	}

	// Check forwards and backwards 5 characters to find the unique checksum in the middle, which
	// gets stored in "intermediate".
	forwardscan(k1, 5);
	backwardscan(k2, 5);

	printf("Starting research\n");

	// Find the 5 characters that take k1 to intermediate, as well as intermediate to k2.
	check(k1, intermediate, 5);
	check(intermediate, k2, 5);
}

Note that we actually first figured out the missing 7 characters in the goo.gl line by brute-forcing 4 and 3 characters on either side, so I’ve left in some of that as comments in case you want to try running it yourself. We need to know these 7 characters before we can determine the missing 10 characters in the next line. (it turned out to be goo.gl/hsy11d, which does indeed link to the same Wikipedia article the rest of the text is from)

We also had a bit of trouble running this script at first because our laptop only had 8 GB of RAM and storing a billion 64-bit integers needs a little more than that. Luckily, we had a teammate with access to a machine with around 100 gigs, so we were able to run it on that. It ended up taking a very reasonable ~10 minutes, and did indeed print out the 10 missing characters we needed to complete the flag, which was CTF{dZXicOXLaMumrTPIUTYMI}.

A few final notes:

  • We spent a really, really long time on this challenge (two of us worked on it for ~14 hours total, and several more for a few hours). While it was frustrating at times, especially when we realized we had to learn about ARM, in the end it was a very satisfying and interesting challenge and I’m glad I worked on it.
  • We had many issues while writing the above C++ script, including many silly things like mistranscribing some checksums. It helped a lot that we were already provided many examples of rows with checksums that we could use to debug our code.
Advertisements

2 thoughts on “Google CTF 2018 – Tape

  1. Sounds cool!
    But is there any way solving it without a 100GB machine?
    If you didn’t notice, the polynomial used in the tapereader is of CRC-64-ECMA (check out Wikipedia’s article about CRC) – so maybe there is a simpler algorithm for error correction?
    Thanks!

    Like

    1. Oh, wow, I never noticed that the checksum algorithm is a direct implementation of CRC (I think the person who wrote the code did and we didn’t communicate very well). I’ll edit the post to add that.

      I’m not sure if knowing that it’s CRC-64-ECMA actually helps, because CRC is only an error-detecting (not error-correcting) code, and googling “CRC error correction” and similar things doesn’t seem to give much. However, 100 GB is by no means necessary: I think it ended up using around 12 GB, which is a lot more reasonable. If we didn’t have access to such a machine, we could have split the search into 2-3 parts and it should have been fine.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s