This crypto challenge was somewhat strange in that it took the form of a scrambled audio file. The challenge description tells us the file was formed by taking an audio recording of someone speaking (specifically, saying “The flag is”, pausing for 1-2 seconds, and then reading out the flag) XORed with a linear-feedback shift register. We’re also told that the LFSR has at most 32 bits of state.
Interestingly, we were the only team to solve this challenge! We probably wouldn’t have gotten it without the hint (although we did have some of the correct ideas), but with the hint it was fairly straightforward conceptually, so it seems a bit surprising more teams didn’t solve it.
First, let’s talk about LFSRs. As more of a CS/math person than an EE/signals person, I like to think of an LFSR as a long pseudo-random bit sequence generated by a linear recurrence relation over . For example, an LFSR with 4 bits of state can be generated by the recurrence , along with four seed bits through . An important property of such sequences is that they are not cryptographically secure; if an LFSR has bits of state, there exists a polynomial time algorithm (the Berlekamp-Massey algorithm) that takes in a consecutive bits from this sequence and can recover the recurrence defining the LFSR (essentially by solving some system of linear equations).
From this fact, there is a fairly obvious first attempt to try: take at least 64 consecutive bits of the audio file from where we think the pause is, run it through the Berlekamp-Massey algorithm to recover the LFSR, and descramble the audio file using this LFSR. If the (unscrambled) audio file has value 0 during the pause, then the corresponding bits in the scrambled audio file will be exactly the value of the LFSR, so we can recover the LFSR this way. Unfortunately, implementing this we don’t get any reasonable audio files (in particular the LFSR does not agree with the rest of the pause).
The second most optimistic thing one could hope for is that, if the audio file isn’t uniformly zero in the pause, perhaps it is uniformly some other constant (for instance, if it is encoded using 16-bit LPCM, it’s possible each sample has value when there is no audio). Since the audio has 16 bits per sample, this would mean the audio file is periodic every 16-bits, and therefore that would remove the contribution from the audio file and satisfy the recurrence of the LFSR. We ran the Berlekamp-Massey algorithm on these values, but again failed to get any valid audio files.
This leaves the conclusion that the audio file must not be perfectly silent during the pause, and there are small amounts of ambient noise. At this point, we got a bit stuck, since recovering the recurrence of an LFSR with noise is much harder than just running Berlekamp-Massey. One approach we played with was enumerating all seeds and LSFRS, and seeing which sequence fit the pause the closest; unfortunately, even with 16-bit LFSRs, there are such sequences, and this was taking a long time to run in python. Another approach we thought about was to brute force 64 consecutive bits of the LSFR based on the fact that the noise was bounded, and then run Berlekamp-Massey to recover the LFSR. Unfortunately, we didn’t know how big the noise would be; if the noise could take on different values, this approach would require iterations of Berlekamp-Massey (64 bits required/16 bits per sample). If was around 100 or so, then this would be computationally prohibitive; at this point we ended up taking a break and looking at other problems.
It turns out that this was in fact the correct approach. The hint posted the next day stated that the noise was usually in the range , corresponding to . In this case , so iterations of Berlekamp-Massey is not too bad. (In retrospect, perhaps we should have realized the noise would be small because sound generally scales logarithmically).
Implementing this eventually lead us to the decrypted audio file, but we ran into a large number of bugs along the way (mostly the result of coding this up at 3 AM). These included:
- Converting a byte to its 8 bits in Python by running ‘bin(b)[2:]’.
- Little-endian means the bytes are in the reverse order, not the bits.
- (This one was arguably not our fault!) Apparently, .WAV files encoded in 16-bit LPCM encode the values of their samples as bits in 2’s complement notation. This was confusing, since the example on the Wikipedia page for LPCM shows values being encoded by simply shifting all values upwards until they are unsigned (i.e., adding to samples that might lie in the range ). Moreover, this is actually how 8-bit LPCM .WAV files are encoded! We ended up having to ask an admin about this, who gave us the very useful hint of looking at an actual .WAV file (this also helped with debugging the other issues).
Here is the script we ended up using (warning: I added some comments, but it’s probably still kind of confusing):
import numpy import copy from itertools import product # Berklekamp-Massey Python implementation from https://gist.github.com/StuartGordonReid/a514ed478d42eca49568 # l is the LFSR length # c is a list of coefficients of the characteristic polynomial def berlekamp_massey_algorithm(int_data): n = len(int_data) c = numpy.zeros(n) b = numpy.zeros(n) c, b = 1, 1 l, m, i = 0, -1, 0 while i < n: v = int_data[(i - l):i] v = v[::-1] cc = c[1:l + 1] d = (int_data[i] + numpy.dot(v, cc)) % 2 if d == 1: temp = copy.copy(c) p = numpy.zeros(n) for j in range(0, l): if b[j] == 1: p[j + i - m] = 1 c = (c + p) % 2 if l <= 0.5 * i: l = i + 1 - l m = i b = temp i += 1 return l, c # generates LFSR bits def gen_block(seed, taps, L): block = [x for x in seed] for _ in xrange(L): nxt = 0 for tap in taps: nxt ^= block[-tap] block.append(nxt) return block # converts an integer in the range [-2^15,2^15-1] to its 16-bit WAV representation def to_bin(val): if val < 0: val = 2**16+val ans =  for i in xrange(16): ans.append(val%2) val/=2 ans.reverse() return ans[8:] + ans[:8] # inverse of to_bin def to_val(block): block = block[8:] + block[:8] block.reverse() val = sum([x*2**i for i, x in enumerate(block)]) if val >= 2**15: return val - 2**16 return val MIN_NOISE = -8 MAX_NOISE = 8 TEST_LEN = 160 def analyze_block(block): for noises in product(range(MIN_NOISE, MAX_NOISE+1), repeat=4): bnoises = map(to_bin, noises) bnoise = bnoises + bnoises + bnoises + bnoises # bnoise is our noise values guess lchunk = block[:64] lchunk = [b1^b2 for b1, b2 in zip(bnoise, lchunk)] # xor with encrypted data to get LFSR bits _, lfsr = berlekamp_massey_algorithm(lchunk) taps = [i for i, v in enumerate(lfsr) if v==1] # run Berlekamp-Massey to determine LFSR that gives those 64 bits testblock = gen_block(lchunk, taps, TEST_LEN-64) # extend the 64 LFSR bits to TEST_LEN bits... descrambled = [b1^b2 for b1, b2 in zip(block[:TEST_LEN], testblock)] # ...and xor with encrypted data to get decrypted audio data samples = [abs(to_val(descrambled[16*i:16*(i+1)])) for i in xrange(len(descrambled)/16)] # now we need to check if this decrypted audio data is correct mean = sum(samples)/(1.*len(samples)) # print '---', noises, mean if mean < 16: # check if it's mostly silent (if the LFSR were wrong, then the values should be random, so mean would be around ~2^14) print 'WOW' print lchunk, lfsr, taps # if it is silent, then we've likely found the right LFSR! data = open('eavesdropped_radio', 'rb').read() # LPCM with a sample rate of 16 kHz and 16 bits per sample (little endian) # converts byte to 8 binary digits...yes, we did name this "blah" def blah(x): bad = bin(ord(x))[2:] better = '0'*(8-len(bad)) + bad assert len(better) == 8 return better bindata = ''.join(blah(x) for x in data) bindata = [0 if x == '0' else 1 for x in bindata] print len(bindata) # 0.5 seconds = 0.5*16000*16 = 128000 bits BLOCK_SIZE = 128000 BLOCK_WIDTH = 1000 # loops through the data in .5 second increments and brute-forces noise values as described above """ for i in xrange(6*BLOCK_SIZE, len(bindata), BLOCK_SIZE): print 'checking block ', i/BLOCK_SIZE block = bindata[i:i+BLOCK_WIDTH] print i, hex(i/8) # print block print [to_val(block[j:j+16]) for j in xrange(0, len(block), 16)] analyze_block(block) """ # now that we've found the correct LFSR, we can decrypt the rest of the file start = 6*BLOCK_SIZE seed = [0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1] taps = [0, 1, 2, 22, 32] prg = gen_block(seed, taps, len(bindata) - start) print 'done generating block' unscrambled = [b1^b2 for b1, b2 in zip(prg, bindata[start:])] to_byte = lambda a: sum([c*2**(7-i) for i, c in enumerate(a)]) ndata = ''.join(chr(to_byte(unscrambled[i:i+8])) for i in xrange(0,len(unscrambled), 8)) open('uneavesdropped', 'wb').write(ndata)