# SharifCTF 7 – Blobfish

In this crypto challenge, we are provided with a chosen plaintext attack (notably, one that has already been carried out; we cannot actually run a chosen plaintext attack) on a custom designed block cipher whose implementation is provided. The goal is to recover the key of the block cipher.

The cipher in question is a three-round substitution-permutation network. The cipher maps 36-bit inputs to a 36-bit output. Each substitution layer is implemented via 6 S-boxes which act on 6 bits each. Overall, it looks something like this picture from Wikipedia (but without the initial $K_0$, and an additional P step at the end):

An interesting feature of the provided log of the chosen-plaintext attack is that there are several plaintexts which map to very similar ciphertexts. Here’s a short excerpt of the log for reference:

  plaintext  ->  ciphertext
============================
0x02637952D  ->  0x674F229F7
0x02CC0F857  ->  0xE9BFB5DFD
0x02CC0F859  ->  0xE2BFF5DFD
0x03E37952D  ->  0x777A30975
0x0C167731F  ->  0xAE15A3564
0x0C167733F  ->  0xCCA0C3D18
0x12E295065  ->  0xF5E17D8F3
0x12FCD81B5  ->  0x71C92BB21
0x136295065  ->  0x658B0947C
0x142AF9FC3  ->  0x9AFAF4CEF
0x142CF9FC3  ->  0xABBEF534A
...


This was one of the first differential cryptanalysis challenges we’ve solved, so it was pretty interesting (generally I’m much better at more algebraic-based crypto). The key to differential cryptanalysis challenges seems to be to exploit linearity in the components of the cipher, especially linearity of sums/differences of ciphertexts (hence ‘differential’).

For example, let $S(x)$ be the permutation on 36-bit inputs induced by the S-boxes, let $P(x)$ be the permutation on 36-bit inputs induced by the P-box, and let $K_i(x) = x + k_i$ be the permutation on 36-bit inputs induced by XORing with the $i$th key. Then $P(x)$ and $K_{i}(x)$ act linearly on sums of inputs, in the sense that $P(x) + P(y) = P(x+y)$, and $K_{i}(x) + K_{i}(y) = x + y$. If the function $S(x)$ was also linear in this sense (say, satisfying $S(x) + S(y) = S(x+y)$), then we could also easily compute the sum of inputs $x+y$ from the sum of encryptions $f(x)+f(y)$ for any inputs $x$ and $y$. This would allow us to break the cipher and decrypt any value given any plaintext/ciphertext pair (finding the key is a bit harder).

Of course, unlike $P$ and $K_{i}$, $S$ does not act linearly on sums. One could hope that $S$ acts almost linearly on sums and leverage that, but it is not too hard to check that $S$ is far from any linear function. Instead, for this challenge, we’ll use the fact that $P$, $S$, and $K_i$ all preserve (to some extent) the sparsity of the differentials.

To see what we mean by this, let $|x|$ equal the Hamming weight (i.e. the number of 1s) of an input $x$. Then, in addition to being linear, it is also true that $|P(x) + P(y)| = |x+y|$ and $|K_{i}(x) + K_{i}(y)| = |x+y|$. Unfortunately, it is not quite true that $|S(x) + S(y)| = |x+y|$, but we can say some things. For instance if $|x+y| = 0$ (and thus $x=y$), it is also true that $S(x)=S(y)$ and $|S(x)+S(y)|=0$. More generally, since the function $S$ is comprised of 6 smaller S-boxes which each act on 6 bits of the input, $S$ can expand the sparsity by at most a factor of $6$; that is, $|S(x) + S(y)| \leq 6|x+y|$ (there are at most $|x+y|$ blocks of 6 bits where $x$ and $y$ differ; on all the other blocks, $S(x)$ must equal $S(y)$). And in fact, this is a worst-case bound; on average $S$ shouldn’t expand the sparsity by more than a factor of $3$, and in many cases (especially for opportunely chosen inputs) $S$ may not expand the sparsity at all.

Since there are only three rounds, and there are many examples in the log with small values of $|f(x)+f(y)|$ (e.g. the pair 0x02CC0F857 -> 0xE9BFB5DFD, 0x02CC0F859 -> 0xE2BFF5DFD), this gives us hope for an attack. Let $x$ and $y$ be two chosen plaintexts with a very sparse differential (ideally, $|x+y| = 1$). We’ll show how to recover some information about the key from these plaintexts and their encryptions.

Let $x' = f(x)$ and $y' = f(y)$. We can visualize the overall cipher $f$ acting on $x$ and $y$ via the following diagram.

x->S->P->K1->S->P->K2->S->P->k3->x'
y->S->P->K1->S->P->K2->S->P->k3->y'


Let $a = K_1(P(S(x)))$, and let $b = K_1(P(S(y)))$. Since we do not know $K_1$, we cannot compute $a$ or $b$, but we can compute $a+b$: we directly compute $P(S(x))$ and $P(S(y))$, and then $a+b=K_1(P(S(x)))+K_1(P(S(y)))=P(S(x))+P(S(y))$ because the xors cancel out. Likewise, let $a' = S(K_2(P(S(a))))$ (the output of the third $S$ step for $x$), and define $b'$ similarly for $y$. Again, we cannot compute $a'$ or $b'$, but we can compute $a'+b'$ by working backward from $x'+y'$: $a'+b'=P^{-1}(K_3(x'))+P^{-1}(K_3(y'))=P^{-1}(x')+P^{-1}(y')$ because again the xors cancel out. Our diagram now looks something like this:

a->S->P->K2->S->a'
b->S->P->K2->S->b'

known: a+b, a'+b'


Unfortunately, this is as far as we can compute the differentials; knowing $x+y$ does not let us compute $S(x)+S(y)$. However, if $x+y$ is very sparse, then this restricts the number of possibilities of $S(x) + S(y)$. In particular, if $x+y$ is zero in all but $s$ 6-bit blocks, there are at most $64^s$ choices for $S(x)+S(y)$ (this looks like a lot, but is quite manageable if $s$ is less than 3).

This lets us implement a sort of meet-in-the-middle attack. Let $\mathcal{F}(r) = \{S(a) + S(b) | a+b=r\}$, and let $\mathcal{B}(r) = \{a+b|S(a)+S(b)=r\}$. Then, if $a'' = S(a)$ and $b'' = S(b)$, we have that $a''+b'' \in \mathcal{F}(a+b)$ by definition, and working backwards we also have $a''+b'' \in P^{-1}(\mathcal{B}(a'+b'))$, because the $K_2$ xor cancels out and $P^{-1}$ is a linear function. In many cases, this restricts the possibilities for $a''+b''$ down to one or two cases. Each of these cases restricts what several of the blocks in $a$ and $b$ can be (and hence what the blocks of $k_1$ can be). Doing this for enough pairs of similar plaintexts/ciphertexts lets us recover $k_1$ entirely.

Once we have $k_1$, we can do a similar (but even easier) attack to recover $k_2$, based on the input and output differentials to the third $S(x)$ (details omitted). And finally, once we have $k_1$ and $k_2$, we can easily compute $k_3$ from any plaintext/ciphertext pair.

We realize that we left out some details, so here’s a worked example for the second and third line of the log (bits are divided into groups of 6 for clarity, and the notation is the same as above):

We are given
x=0x02CC0F857  ->  x'=0xE9BFB5DFD
y=0x02CC0F859  ->  y'=0xE2BFF5DFD

We can directly compute
P(S(x))=111001 100111 101111 010011 000001 101101
P(S(y))=111001 100111 101111 010111 000001 101101

Therefore a+b=P(S(x))+P(S(y))=000000 000000 000000 000100 000000 000000

We also have
P^-1(x')=110111 001011 111101 110111 011110 011111
P^-1(y')=110111 001011 111101 110111 110011 011111

So a'+b'=P^-1(x')+P^-1(y')=000000 000000 000000 000000 101101 000000

Now let's find the sets F(a+b) and P^-1(B(a'+b')) as described above.

First, F(a+b) corresponds to possible values of a''+b''=S(a)+S(b). There are 64 (technically 63) possibilities, namely all strings of the form:
000000 000000 000000 ****** 000000 000000 (* can be either 0 or 1, but not all *s can be 0)

To compute P^-1(B(a'+b')), we can first determine the 64 possible values in B(a'+b'), or S^-1(a')+S^-1(b'):
000000 000000 000000 000000 ****** 000000
Running this back through P^-1 gives:
000*00 00000* 00000* 00*000 0*0000 0000*0

We got super lucky in this case, because there is only one number in both sets, namely:
000000 000000 000000 001000 000000 000000

Therefore, we have the following two equations:
a   +b   =000000 000000 000000 000100 000000 000000
S(a)+S(b)=000000 000000 000000 001000 000000 000000

Of course, this doesn't give us any information about the 000000 blocks, but this does restrict possibilities for the 4th block.
By bruteforcing all 64^2 pairs, it turns out that there's only two possibilities for this block:
a = ****** ****** ****** 101011 ****** ******
b = ****** ****** ****** 101111 ****** ******
(the second possibility is switching a and b)
Finally, we know that K_1(P(S(x)))=a, so K_1=P(S(x))+a. Therefore, the fourth block of K_1 has two possibilities
K_1=****** ****** ****** 111000 ****** ******
K_1=****** ****** ****** 111100 ****** ******
We've reduced the total number of K_1 possibilities from 2^36 to 2^30*2, which is pretty good for a single plaintext/ciphertext pair!


In the above example, we got really lucky because there was only one possible value in the intersection $\mathcal{F}(a+b) \cap P^{-1}(\mathcal{B}(a'+b'))$, but even if there are multiple possible values, this can still reduce the number of key possibilities, and there are enough pairs in the log to uniquely determine the value of the key using this strategy. However, if the cipher had 4 or more rounds, I’m not sure this strategy would work because the possibilities probably blow up too quickly.

For completeness, here’s the Python script we used to implement the above attack, but it’s uncommented and very messy, so I wouldn’t bother reading it. Though if you do read it and have questions on how it works, feel free to ask.

from itertools import product
from hashlib import md5

rounds = 3
nsbox = 6	# number of s-boxes
bsbox = 6	# input bits per s-box
ssize = 1 << bsbox
bits = nsbox * bsbox
insize = 1 << bits

s = [59, 21, 25, 32, 30, 3, 63, 38, 5, 29, 40, 53, 17, 56, 58, 37, 45, 43, 52, 61, 7, 55, 57, 12, 26, 13, 49, 16, 36, 8, 31, 41, 20, 51, 33, 15, 1, 0, 23, 27, 35, 18, 47, 62, 14, 60, 10, 54, 46, 50, 9, 48, 24, 28, 44, 2, 6, 34, 19, 42, 11, 39, 22, 4]
p = [14, 7, 25, 2, 22, 32, 29, 0, 28, 31, 6, 18, 12, 27, 33, 9, 17, 21, 8, 26, 23, 35, 4, 5, 11, 20, 30, 24, 15, 1, 19, 16, 10, 13, 34, 3]

s_inv = [0] * len(s)
for i in range(len(s)):
s_inv[s[i]] = i

def Sp(r):
ans = dict()
for x in xrange(64):
y = x^r
v = s[x]^s[y]
if v in ans:
ans[v].append((x,y))
else:
ans[v] = [(x,y)]
return ans

def Sinvp(r):
ans = dict()
for x in xrange(64):
y = x^r
v = s_inv[x]^s_inv[y]
if v in ans:
ans[v].append((s_inv[x],s_inv[y]))
else:
ans[v] = [(s_inv[x], s_inv[y])]
return ans

def split(x):
y = [0] * nsbox
for i in range(nsbox):
y[i] = (x >> (i * bsbox)) & 0x3f
return y

def merge(x):
y = 0
for i in range(nsbox):
y |= x[i] << (i * bsbox)
return y

def pbox(x):
y = 0
for i in range(len(p)):
y |= ((x >> i) & 1) << p[i]
return y

def inv_p(x):
y = 0
for i in range(len(p)):
y |= ((x >> p[i]) & 1) << i
return y

Sps = [Sp(r) for r in xrange(64)]
Sinvps = [Sinvp(r) for r in xrange(64)]

def intersection(sdiff, ediff):
sdpts = split(sdiff)
sdsps = [Sps[pt].keys() for pt in sdpts]

edpts = split(ediff)
edsps = [Sinvps[pt].keys() for pt in edpts]

sprod = reduce(lambda x, y: x*y, [len(x) for x in sdsps], 1)
eprod = reduce(lambda x, y: x*y, [len(x) for x in edsps], 1)

if sprod > 1000000 or eprod > 1000000:
print 'abort!'
return None

middiffs = set([pbox(merge(x)) for x in product(*sdsps)])
middiffs2 = set([merge(x) for x in product(*edsps)])

return middiffs&middiffs2

def SBOX(x):
return merge([s[y] for y in split(x)])

def start_diff(px, py):
vx = pbox(merge([s[pt] for pt in split(px)]))
vy = pbox(merge([s[pt] for pt in split(py)]))
return vx^vy

def end_diff(cx, cy):
return inv_p(cx^cy)

def filter_k1(px, py, middiffs, k1):
vx = pbox(merge([s[pt] for pt in split(px)]))
vy = pbox(merge([s[pt] for pt in split(py)]))
vxpts = split(vx)
vypts = split(vy)

possk = [set() for _ in xrange(6)]

for middiff in middiffs:
minvPdiff = inv_p(middiff)
mdiffpts = split(minvPdiff)

for i in xrange(6):
possi = Sinvps[mdiffpts[i]][vxpts[i]^vypts[i]]
# list of possible (vx^k, vy^k)
for z, _ in possi:

for i in xrange(6):
k1[i] = [z for z in k1[i] if z in possk[i]]

return k1

def obtain_k1():
k1 = [range(64) for _ in xrange(6)]
for i in xrange(0, len(data), 2):
px = data[i][0]
cx = data[i][1]
py = data[i+1][0]
cy = data[i+1][1]

sdiff = start_diff(px, py)
ediff = end_diff(cx, cy)
middiff = intersection(sdiff, ediff)
if middiff:
print middiff

k1 = filter_k1(px, py, middiff, k1)
print 'UPDATE', k1
return k1

def filter_k2(px, py, cx, cy, k1, k2):
sx = pbox(SBOX(k1^pbox(SBOX(px))))
sy = pbox(SBOX(k1^pbox(SBOX(py))))

sdiff = sx^sy
cdiff = cx^cy
ediff = inv_p(cdiff)

sxpts = split(sx)
sdiffpts = split(sdiff)
cdiffpts = split(cdiff)
ediffpts = split(ediff)

possk = [set() for _ in xrange(6)]
for i in xrange(6):
possi = Sinvps[ediffpts[i]][sdiffpts[i]]
for z, _ in possi:

for i in xrange(6):
k2[i] = [z for z in k2[i] if z in possk[i]]

def obtain_k2(k1):
k2 = [range(64) for _ in xrange(6)]
for i in xrange(0, len(data), 2):
px = data[i][0]
cx = data[i][1]
py = data[i+1][0]
cy = data[i+1][1]

filter_k2(px,py, cx, cy, k1, k2)
print 'UPDATE K2:', k2
return k2

def obtain_k3(k1, k2):
px = data[0][0]
cx = data[0][1]

return pbox(SBOX(k2^pbox(SBOX(k1^pbox(SBOX(px))))))^cx

lines = [line for line in open('bloblog.txt','r')]
data = [map(lambda x: int(x, 16), line.split('|')) for line in lines]

k1 = obtain_k1()
assert all([len(x)==1 for x in k1])
k1 = merge([x[0] for x in k1])

k2 = obtain_k2(k1)
assert all([len(x)==1 for x in k2])
k2 = merge([x[0] for x in k2])

k3 = obtain_k3(k1, k2)

print k1, k2, k3

def make_flag(k):
param = md5(b"%x%x%x" % (k[0], k[1], k[2])).hexdigest()
return "SharifCTF{%s}" % param

print make_flag((k1, k2, k3))