SharifCTF 7 – Radio Intelligence

This crypto challenge was somewhat strange in that it took the form of a scrambled audio file. The challenge description tells us the file was formed by taking an audio recording of someone speaking (specifically, saying “The flag is”, pausing for 1-2 seconds, and then reading out the flag) XORed with a linear-feedback shift register. We’re also told that the LFSR has at most 32 bits of state.

Interestingly, we were the only team to solve this challenge! We probably wouldn’t have gotten it without the hint (although we did have some of the correct ideas), but with the hint it was fairly straightforward conceptually, so it seems a bit surprising more teams didn’t solve it.

First, let’s talk about LFSRs. As more of a CS/math person than an EE/signals person, I like to think of an LFSR as a long pseudo-random bit sequence generated by a linear recurrence relation over \mathbb{F}_2. For example, an LFSR with 4 bits of state can be generated by the recurrence b_{n} = b_{n-1} + b_{n-3} + b_{n-4}, along with four seed bits b_{0} through b_{3}. An important property of such sequences is that they are not cryptographically secure; if an LFSR has b bits of state, there exists a polynomial time algorithm (the Berlekamp-Massey algorithm) that takes in a consecutive 2b bits from this sequence and can recover the recurrence defining the LFSR (essentially by solving some system of linear equations).

From this fact, there is a fairly obvious first attempt to try: take at least 64 consecutive bits of the audio file from where we think the pause is, run it through the Berlekamp-Massey algorithm to recover the LFSR, and descramble the audio file using this LFSR. If the (unscrambled) audio file has value 0 during the pause, then the corresponding bits in the scrambled audio file will be exactly the value of the LFSR, so we can recover the LFSR this way. Unfortunately, implementing this we don’t get any reasonable audio files (in particular the LFSR does not agree with the rest of the pause).

The second most optimistic thing one could hope for is that, if the audio file isn’t uniformly zero in the pause, perhaps it is uniformly some other constant (for instance, if it is encoded using 16-bit LPCM, it’s possible each sample has value 2^{15} when there is no audio). Since the audio has 16 bits per sample, this would mean the audio file is periodic every 16-bits, and therefore that b_{n} + b_{n-16} would remove the contribution from the audio file and satisfy the recurrence of the LFSR. We ran the Berlekamp-Massey algorithm on these values, but again failed to get any valid audio files.

This leaves the conclusion that the audio file must not be perfectly silent during the pause, and there are small amounts of ambient noise. At this point, we got a bit stuck, since recovering the recurrence of an LFSR with noise is much harder than just running Berlekamp-Massey. One approach we played with was enumerating all seeds and LSFRS, and seeing which sequence fit the pause the closest; unfortunately, even with 16-bit LFSRs, there are 2^{31} such sequences, and this was taking a long time to run in python. Another approach we thought about was to brute force 64 consecutive bits of the LSFR based on the fact that the noise was bounded, and then run Berlekamp-Massey to recover the LFSR. Unfortunately, we didn’t know how big the noise would be; if the noise could take on N different values, this approach would require N^4 iterations of Berlekamp-Massey (64 bits required/16 bits per sample). If N was around 100 or so, then this would be computationally prohibitive; at this point we ended up taking a break and looking at other problems.

It turns out that this was in fact the correct approach. The hint posted the next day stated that the noise was usually in the range [-8, 8], corresponding to N=17. In this case N^4 \approx 84000, so N^4 iterations of Berlekamp-Massey is not too bad. (In retrospect, perhaps we should have realized the noise would be small because sound generally scales logarithmically).

Implementing this eventually lead us to the decrypted audio file, but we ran into a large number of bugs along the way (mostly the result of coding this up at 3 AM). These included:

  1. Converting a byte to its 8 bits in Python by running ‘bin(b)[2:]’.
  2. Little-endian means the bytes are in the reverse order, not the bits.
  3. (This one was arguably not our fault!) Apparently, .WAV files encoded in 16-bit LPCM encode the values of their samples as bits in 2’s complement notation. This was confusing, since the example on the Wikipedia page for LPCM shows values being encoded by simply shifting all values upwards until they are unsigned (i.e., adding 2^{15} to samples that might lie in the range [-2^{15}, 2^{15}-1]). Moreover, this is actually how 8-bit LPCM .WAV files are encoded! We ended up having to ask an admin about this, who gave us the very useful hint of looking at an actual .WAV file (this also helped with debugging the other issues).

Here is the script we ended up using (warning: I added some comments, but it’s probably still kind of confusing):

import numpy
import copy
from itertools import product

# Berklekamp-Massey Python implementation from https://gist.github.com/StuartGordonReid/a514ed478d42eca49568
# l is the LFSR length
# c is a list of coefficients of the characteristic polynomial
def berlekamp_massey_algorithm(int_data):
    n = len(int_data)
    c = numpy.zeros(n)
    b = numpy.zeros(n)
    c[0], b[0] = 1, 1
    l, m, i = 0, -1, 0
    while i < n:
        v = int_data[(i - l):i]
        v = v[::-1]
        cc = c[1:l + 1]
        d = (int_data[i] + numpy.dot(v, cc)) % 2
        if d == 1:
            temp = copy.copy(c)
            p = numpy.zeros(n)
            for j in range(0, l):
                if b[j] == 1:
                    p[j + i - m] = 1
            c = (c + p) % 2
            if l <= 0.5 * i:
                l = i + 1 - l
                m = i
                b = temp
        i += 1
    return l, c

# generates LFSR bits
def gen_block(seed, taps, L):
    block = [x for x in seed]
    for _ in xrange(L):
        nxt = 0
        for tap in taps:
            nxt ^= block[-tap]
        block.append(nxt)
    return block

# converts an integer in the range [-2^15,2^15-1] to its 16-bit WAV representation
def to_bin(val):
    if val < 0:
        val = 2**16+val
    ans = []
    for i in xrange(16):
        ans.append(val%2)
        val/=2
    ans.reverse()
    return ans[8:] + ans[:8]

# inverse of to_bin
def to_val(block):
    block = block[8:] + block[:8]
    block.reverse()
    val = sum([x*2**i for i, x in enumerate(block)])
    if val >= 2**15:
        return val - 2**16
    return val

MIN_NOISE = -8
MAX_NOISE = 8
TEST_LEN = 160
def analyze_block(block):
    for noises in product(range(MIN_NOISE, MAX_NOISE+1), repeat=4):
        bnoises = map(to_bin, noises)

        bnoise = bnoises[0] + bnoises[1] + bnoises[2] + bnoises[3] # bnoise is our noise values guess
        lchunk = block[:64]
        lchunk = [b1^b2 for b1, b2 in zip(bnoise, lchunk)] # xor with encrypted data to get LFSR bits

        _, lfsr = berlekamp_massey_algorithm(lchunk)
        taps = [i for i, v in enumerate(lfsr) if v==1] # run Berlekamp-Massey to determine LFSR that gives those 64 bits

        testblock = gen_block(lchunk, taps, TEST_LEN-64) # extend the 64 LFSR bits to TEST_LEN bits...

        descrambled = [b1^b2 for b1, b2 in zip(block[:TEST_LEN], testblock)] # ...and xor with encrypted data to get decrypted audio data
        samples = [abs(to_val(descrambled[16*i:16*(i+1)])) for i in xrange(len(descrambled)/16)]
        # now we need to check if this decrypted audio data is correct
        mean = sum(samples)/(1.*len(samples))
        # print '---', noises, mean
        if mean < 16: # check if it's mostly silent (if the LFSR were wrong, then the values should be random, so mean would be around ~2^14)
            print 'WOW'
            print lchunk, lfsr, taps # if it is silent, then we've likely found the right LFSR!



data = open('eavesdropped_radio', 'rb').read()
# LPCM with a sample rate of 16 kHz and 16 bits per sample (little endian)

# converts byte to 8 binary digits...yes, we did name this "blah"
def blah(x):
    bad = bin(ord(x))[2:]
    better = '0'*(8-len(bad)) + bad
    assert len(better) == 8
    return better

bindata = ''.join(blah(x) for x in data)
bindata = [0 if x == '0' else 1 for x in bindata]
print len(bindata)

# 0.5 seconds = 0.5*16000*16 = 128000 bits

BLOCK_SIZE = 128000
BLOCK_WIDTH = 1000

# loops through the data in .5 second increments and brute-forces noise values as described above
"""
for i in xrange(6*BLOCK_SIZE, len(bindata), BLOCK_SIZE):
    print 'checking block ', i/BLOCK_SIZE
    block = bindata[i:i+BLOCK_WIDTH]
    print i, hex(i/8)
#    print block
    print [to_val(block[j:j+16]) for j in xrange(0, len(block), 16)]
    analyze_block(block)
"""

# now that we've found the correct LFSR, we can decrypt the rest of the file
start = 6*BLOCK_SIZE 
seed =  [0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1]
taps = [0, 1, 2, 22, 32]
prg = gen_block(seed, taps, len(bindata) - start)

print 'done generating block'

unscrambled = [b1^b2 for b1, b2 in zip(prg, bindata[start:])]
to_byte = lambda a: sum([c*2**(7-i) for i, c in enumerate(a)])
ndata = ''.join(chr(to_byte(unscrambled[i:i+8])) for i in xrange(0,len(unscrambled), 8))
open('uneavesdropped', 'wb').write(ndata)
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s