CSAW Quals 2016 – Still Broken Box

[Note: this is a follow up to the challenge Broken Box. It may be useful to view the writeup for that challenge first].

A while after completing Broken Box, I decided to take a look at Still Broken Box. After playing around for a bit with it on netcat, it seemed like it had the same flaw as Broken Box, so I tried running my script from Broken Box (updated with the new versions of N and e) on the new server.

Interestingly, I got kicked from the server before I managed to recover all of d. I modified the script to log the bits it recovered and tried it again.

from Crypto.Util.number import *
from pwn import *
import time

true_sig = 54991508894724492039711106003951077794074321022420375497245003670448433435320503293883469699552869556601348649896639613951738858915532073464798066917763705885955932628254429128249094851832789202734511238104858532784980373075424816483767415915147869578107622550421197873111079210230709729680555907076587324573
N = 123541066875660402939610015253549618669091153006444623444081648798612931426804474097249983622908131771026653322601466480170685973651622700515979315988600405563682920330486664845273165214922371767569956347920192959023447480720231820595590003596802409832935911909527048717061219934819426128006895966231433690709
e = 97

assert pow(true_sig, e, N) == 2

two_inv = (N+1)/2
assert (2*two_inv)%N == 1

tps = [pow(2, 2**i, N) for i in xrange(1024)]
itps = [pow(two_inv, 2**i, N) for i in xrange(1024)]

true_sig_inv = inverse(true_sig, N)
assert (true_sig_inv*true_sig)%N==1

print 'done preprocessing'
print 'starting challenge'

d_bits = [-1]*1024
def get_d():
    return sum([2**i for i, v in enumerate(d_bits) if v==1])

    conn = remote('crypto.chal.csaw.io', 8003)
    while d_bits.count(-1)>0:
        x = conn.recvline().split(':')
        sind = x.index('signature')
        curS = int(x[sind+1].split(',')[0])
        curN = int(x[sind+2])
        print 'S', curS
        print 'N', curN
        print '# bits left:', d_bits.count(-1)
        print 'cur d:', get_d()

        assert curN == N
        if curS == true_sig:
            print 'true signature'
            quot = (curS*true_sig_inv)%N
            if quot in tps:
                bit_loc = tps.index(quot)
                if d_bits[bit_loc]==-1:
                    print 'found bit:', bit_loc
                    assert d_bits[bit_loc]==0
            elif quot in itps:
                bit_loc = itps.index(quot)
                if d_bits[bit_loc]==-1:
                    print 'found bit:', bit_loc
                    assert d_bits[bit_loc]==1
                print 'WARNING: quot not in tps or itps'
    print "got kicked"
    print "found bits for: "
    print [i for i in xrange(len(d_bits)) if d_bits[i]!=-1]
    print get_d()

print 'cur d:', get_d()
# 48553333005218622988737502487331247543207235050962932759743329631099614121360173210513133
# ^ first 300 bits of d

It turns out that the improved broken box only leaked the first 300 bits of d. This is still a lot of information though; I vaguely remembered some crypto paper I once saw where they recovered the full private key from only its low or high order bits. Looking around, I found this survey, which led me to the Boneh-Durfee-Frankel paper, which describes exactly how to do this.

The rest of the challenge was me fiddling around with implementing this attack. The math behind this attack is really cool though, so I’ll give a rough outline of how and why the attack works here.

The first step of the attack is a reduction of the problem of recovering d from its first 300 bits to the problem of prime factoring N given the first 300 bits of one of its prime factors (this is the main contribution of Boneh-Durfee-Frankel). Here we are immensely helped by the fact that e is small (namely, it is 97 in this problem). To use the fact, we note the following: since de \equiv 1 \bmod \phi(N), there is some k \leq e such that de = k\phi(N) + 1.

Now, if we know \phi(N), we can factor N; if we let N = pq, then \phi(N) = pq - (p+q) + 1, so p and q are roots of the quadratic x^2 + (\phi(N) - N - 1)x + N = 0. And we know that k\phi(N) = de - 1, so multiplying by k and substituting this in, we know that p is a root of:

kx^2 + (de - kN - k - 1)x + kN = 0

Unfortunately, we only know the value of d modulo 2^{300}, so we actually only know that p is a root of:

kx^2 + (de - kN - k - 1)x + kN \equiv 0 \bmod 2^{300}

But if we successfully guess k, we can solve this quadratic modulo 2^{300} (by say, Hensel lifting) to get a relatively short list of possibilities for p modulo 2^{300} (in the absolute worst case, you could have up to 2^{150} solutions to this quadratic, but on average you should only get around 2 solutions per value of k). By looping over all values of k, we are guaranteed to find the true value of p \bmod 2^{300} at some point.

Okay, so now we have the value of p modulo some big number r (2^{300} in our case). How can we actually recover p and factor N? It turns out that there are a couple of ways to do this, but they all rely on the technique of using lattice-based methods (particularly, the LLL algorithm) to find small solutions to certain polynomial equations. The original paper that did this was due to Coppersmith, where he used LLL to find small solutions for P and Q to the bivariate equation

(rP + p_0)(rQ + q_0) = N

(here p_0 = p \bmod r and q_0 = q \bmod r; note that from p_0 we can find q_0 since p_0q_0 = N \bmod r).

I read through Coppersmith’s paper, but quickly realized that implementing it was going to be a huge pain (it involved working over reals and doing multiple steps of row-reduction). I also tried in vain for a while to find an already existing implementation somewhere of Coppersmith’s algorithm, but somehow came up short (I still don’t know of any implementation of Coppersmith’s algorithm for the bivariate case). I then spent a while looking for alternatives/improvements to Coppersmith’s algorithm. I found this paper by Coron which simplified Coppersmith’s algorithm and worked over finite fields. I spent a while implementing it, but it ended up being quite slow and I think there was a bug in my implementation.

I ended up taking a look at the original paper by Boneh-Durfee-Frankel and noticed that in their Experiments section, they mentioned that instead of directly using Coppersmith’s algorithm, they used the Lattice Factorizing Method, which in practice performed better and was easier to implement (whoops, I should have read this first). I’ll spend the rest of the post describing this method for factoring N.

Recall that are given p_0 = p \bmod r and we would like to recover p. We can rephrase this as finding the value of P for which Pr + p_0 = p. We also know that P is not too big; in particular, if p is the smaller factor of N, then P \leq \sqrt{N}/r. We can therefore rephrase the problem as finding some small P for which Pr + p_0 and N share a nontrivial gcd. Letting A = r^{-1}p_0 \bmod N, note that \gcd(Pr + p_0, N) | \gcd(P + A, N), so we would really just like to find a number “close” to A which shares a common factor (in particular, p) with N.

We now make the following simple (but brilliant) observation: if a polynomial h(x) has small coefficients, small degree, and there is a small X such that h(X) is divisible by a large number M, then h(X) must actually equal 0. Intuitively, this is because if you evaluate a polynomial with small coefficients and small degree at a small point, its value cannot be as large as M in magnitude, so if it is divisible by M, it must equal M. (How small and large do “small” and “large” have to be? See Fact 3.2 of the paper – basically, we just want the L_2 norm of the coefficients of h(Xx) to be at most M/\sqrt{d}, where d is the degree of the polynomial).

How do we apply this fact to our case? Well, we could hope that x+A has small enough coefficients and that p is large enough and that the bound on x is small enough, but it turns out this is not true (in particular A is around the same magnitude as N, which makes things troublesome). It would be nice if there were some way to amplify the smallness of our polynomial or the largeness of our factor p. Well, if x+A is divisible by p then (x+A)^{m} is divisible by p^m. This increases the size of our modulus M, but unfortunately also explodes the size of the coefficients (the degree also increases to m, but this will turn out to be negligible). There are other polynomials we can construct where are also divisible by p^m at our desired value of x – e.g. N^{m-i}(x+A)^{i} and x^{i}(x+A)^{m} but these also all have huge coefficients.

There is one thing we can hope for though; we can hope that, if we add together a bunch of these polynomials, with the right coefficients:

a_0N^{m} + a_1N^{m-1}(x+A) + \dots + a_{k-1}N(x+A)^{m-1} + (b_0 + b_1x+b_2x^2 + \dots + b_tx^{t})(x+A)^{m}

then maybe in the resulting polynomial there will be tons of cancellation and all the coefficients will magically be small enough for our observation to apply. For an arbitrary choice of coefficients, this is ridiculous, but this is exactly where we can use LLL. In particular, if we input into LLL the coefficients of all of the basis polynomials h_i(x) (technically, we actually want the coefficients of h(Xx) where X is an upper bound on the size of x), then LLL will find some linear combination of these polynomials such that the resulting coefficient vector is “short”; which is exactly what we want! It turns out that if we choose the values of m and t optimally then the guarantees on LLL will imply that this method works. (You can read the analysis in the paper).

It turns out this method was implemented in Sage (https://trac.sagemath.org/ticket/2424), but not knowing this I went ahead and implemented it from scratch. Here is the resulting (fairly sloppy) Sage code:

# factors N if N = p*q and there exists a
# solution to x+B = 0 mod p w. |x| < X.
# d and m are parameters (runs LLL of dimension d eventually)
def lattice_factorization_method(N, B, X, d, m):
    assert m < d
    P = PolynomialRing(QQ, "x")
    x = P.0
    f = x+B
    fX = X*x+B
    L_rows = []
    for k in xrange(m):
        g = N^(m-k) * fX^k
        L_rows.append([g[i] for i in xrange(d)])

    for j in xrange(d-m):
        g = (X*x)^j * fX^m
        L_rows.append([g[i] for i in xrange(d)])

    L = Matrix(L_rows)
    # print L
    L_redux = L.LLL()
    # print L_redux
    for row in L_redux:
        h = sum([(c/X^i)*x^i for i,c in enumerate(row)])
        for root, _ in h.roots():
                root = Integer(root)
                g = gcd(root+B, N)
                if g>1 and g<N:
                    return (g, N/g)
    print 'no solutions'
    return None

# factors N given that p = p0 mod r, with r>=N(1/4)
def factor_from_low_bits(N, r, p0):
    assert r>=(N**0.25)

    B = (p0*r.inverse_mod(N))%N
    X = int((N**0.5)/r)
    # print N, B, X
    return lattice_factorization_method(N, B, X, 30, 14)

# finds all solutions to ax^2 + bx + c = 0 mod 2^k
def solve_quadratic_mod_pow2(a, b, c, k):
    Q = lambda x: a*x^2 + b*x + c
    poss = []
    if Q(0)%2==0:
    if Q(1)%2==0:

    # print a, b, c, poss
    for i in xrange(1, k):
        nposs = []
        tp = 2^i
        for p in poss:
            if Q(p)%(2*tp)==0:
            if Q(p+tp)%(2*tp)==0:
            poss = nposs
    return poss

# factors N=p*q given d0 where d0=d mod 2^log(r) and d*e=1 mod phi(N)
def bdf(N, e, d0, logr):
    r = 2^logr
    for k in xrange(1, e):
        # get quadratic
        print 'trying k=', k
        a = k
        b = -(e*d0 - k*N - k - 1)
        c = k*N
        quad_sols = solve_quadratic_mod_pow2(a, b, c, logr)
        print len(quad_sols), "solutions to quadratic"
        for i, p0 in enumerate(quad_sols):
            print i, '/', len(quad_sols)
            print 'try to low_bit factor p0=',p0
            sol = factor_from_low_bits(N, r, p0)
            if sol:
                print 'FOUND IT'
                print sol
                return sol

N = 123541066875660402939610015253549618669091153006444623444081648798612931426804474097249983622908131771026653322601466480170685973651622700515979315988600405563682920330486664845273165214922371767569956347920192959023447480720231820595590003596802409832935911909527048717061219934819426128006895966231433690709
e = 97
d0 = 48553333005218622988737502487331247543207235050962932759743329631099614121360173210513133
logr = 300
print bdf(N, e, d0, logr)

Running this for about 20 minutes, we obtain our value of d, and hence our answer:

trying k= 57
8 solutions to quadratic
0 / 8
try to low_bit factor p0= 58657369744454566721200703256814591930211340416069540835353194038459870911563795690085921
no solutions
1 / 8
try to low_bit factor p0= 1077175357911697609855423547461503672455945537249037666153423418715650520793232148781784609
(10734991637891904881084049063230500677461594645206400955916129307892684665074341324245311828467206439443570911177697615473846955787537749526647352553710047, 11508259255609528178782985672384489181881780969423759372962395789423779211087080016838545204916636221839732993706338791571211260830264085606598128514985547)
>>> p = 10734991637891904881084049063230500677461594645206400955916129307892684665074341324245311828467206439443570911177697615473846955787537749526647352553710047
>>> q = 11508259255609528178782985672384489181881780969423759372962395789423779211087080016838545204916636221839732993706338791571211260830264085606598128514985547
>>> N = p*q
>>> e = 97
>>> d = inverse(e, (p-1)*(q-1))
>>> flag = int(open('still_flag.enc', 'r').read())
>>> pow(flag, d, N)
>>> long_to_bytes(_)

One thought on “CSAW Quals 2016 – Still Broken Box

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s