Bristol Cryptography Blog: June 2015

Friday, June 26, 2015

52 Things: Number 38: What is the difference between a covert channel and a side-channel?

This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know To Do Cryptography': a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year. In this week, we discuss the difference between a covert channel and a side-channel.

Covert channels and side-channels are two types of information leakage channels.

A covert channel uses mechanisms that are not intended for communications, e.g., writing and checking if a file is locked to convey a “1” or “0”. In a covert channel an insider process leaks information to an outsider process not normally allowed to access that information. The insider (sending) process could be a Trojan horse program previously inserted stealthily into the computer. An outsider (receiving) process need only be an unprivileged process [1].

In side-channel attacks, also known as passive non-invasive attacks, the cryptographic device is essentially attacked as it is, i.e. only directly accessible interfaces are exploited. The device is not permanently altered and therefore no evidence of an attack is left behind. The basic idea of side-channel attacks is to determine the secret key of a cryptographic device by measuring its execution time, its power consumption, or its electromagnetic field [2].

In a physical side-channel attack, unconventional techniques are used to deduce secret information. Typically, the device has been stolen or captured by the adversary who then has physical access to it for launching a physical side-channel attack. Traditional side-channel attacks involved differential power analysis and timing analysis. Different amounts of power (or time) used by the device in performing an encryption can be measured and analysed to deduce some or all of the key bits. The number of trials needed in a power or timing side-channel attack could be much less than that needed in mathematical cryptanalysis [1].

In software side-channel attacks a victim process inadvertently assumes the role of the sending process, and a listening (attacker) process assumes the role of the receiving process. If the victim process is performing an encryption using a secret key, a software side-channel attack allows the listening process to get information that leads to partial or full recovery of the key [1].

[1] Wang, Zhenghong, and Ruby B. Lee. "Covert and side channels due to processor architecture." Computer Security Applications Conference, 2006. ACSAC'06. 22nd Annual. IEEE, 2006.

[2] Mangard, Stefan, Elisabeth Oswald, and Thomas Popp. Power analysis attacks: Revealing the secrets of smart cards. Vol. 31. Springer Science & Business Media, 2008.

Sunday, June 21, 2015

52 Things: Number 37: The Number Field Sieve

This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know To Do Cryptography': a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year. In this week, we continue the mathematical attacks with the NFS algorithm.

The Number Field Sieve (NFS) is currently the most efficient known factoring algorithm. Its running time depends on the size of the number to be factored but not the size of its factors. NFS based on the idea of factoring by congruent squares: given a large integer $N$, we want to find two integers $x$ and $y$ such that $x^2=y^2 (mod \ N)$. Then hopefully we have $gcd(x-y,N)$ is a non-trivial factor of $N$.

We roughly outline how NFS works. The first step of the algorithm is to choose two monic, irreducible polynomials $f_1$ and $f_2$ of small degrees $d_1$ and $d_2$. Let $m \in Z$ be a common root of the two polynomials such that $f_1(m)=f_2(m)=0 (mod \ N)$. Let $\theta_1, \theta_2 \in C$ be two complex roots of $f_1$ and $f_2$ respectively, we construct two algebraic number fields $Z[\theta_i]=Q(\theta_i)$, where $i=1,2$. Actually this gives us two number rings with multiplication defined as polynomial multiplication. Then we define the homomorphisms $\phi_i : \ Z[\theta_i] \rightarrow Z_N$, which maps $\theta_i$ to $m$ (where $i=1,2$). The NFS algorithm aims to find two squares $\gamma_1^2$ and $\gamma_2^2$ from each of the two number rings, such that $\gamma_1^2= \prod_{(a,b) \in S}(a-b\cdot \theta_1)$ and $\gamma_2^2= \prod_{(a,b) \in S}(a-b\cdot \theta_2)$, where $\gamma_1 \in Z[\theta_1]$, $\gamma_2 \in Z[\theta_2]$ and $S$ is a finite set of coprime integer pairs $(a,b)$. In order to find such a set $S$, we will sieve the elements of the form $a-b\cdot \theta_i$ for pairs of $(a,b)$ such that $a-b\cdot \theta_i$ is smooth over some algebraic factorbase. How fast we can find the set $S$ is the key to the efficiency of the algorithm. Next, we need to extract the square root of $\gamma_i^2$ to obtain $\gamma_i$, where $i=1,2$. The methods of Couveignes [1] and Montgomery [2] can be used here. Once the two square roots are calculated, we apply the homomorphisms to have $\phi_1(\gamma_1)^2 = \phi_2(\gamma_2)^2 (mod \ N)$ and expect to have $gcd(N,\phi_1(\gamma_1)-\phi_2(\gamma_2)) \neq 1$ or $N$ is a non-trivial factor of $N$.

[1] Couveignes, Jean-Marc. "Computing a square root for the number field sieve." The development of the number field sieve. Springer Berlin Heidelberg, 1993. 95-102.
[2] Montgomery, Peter L. "Square roots of products of algebraic numbers." Mathematics of Computation (1993): 567-571. APA

Friday, June 12, 2015

Attacking PUF-Based pattern Matching Key Generators via Helper Data Manipulation

Marcin led the last study group on ``Attacking PUF-Based pattern Matching Key Generators via Helper Data Manipulation'' (Jeroen Delvaux and Ingrid Verbauwhede) presented at CT-RSA 2014 (link ).

Physically Unclonable Functions (PUFs) can be roughly thought as `random' functions accepting a challenge (typically a sequence of bits) as input, and generating a response (a different sequence of bits) that is unique for each PUF and for each physical instance. More precisely, it is a physical device that produces unclonable challenge-response pairs (CRPs); this means that the input/output behavior of any physical copy of one PUF will differ from that of the original one due to some uncontrollable randomness in the copying process.

PUFs are emerging hardware primitives which can be used for example in key generation applications, replacing the more conventional non-volatile memory (NVM); thus, instead of storing the secret key in digital memory, PUFs permit to derive it from the physical characteristics of the integrated circuits (ICs), reducing consequently the risks of physical and invasive attacks.

Unfortunately, there are two main issues concerning PUFs, namely the lack of robustness and unpredictability: in some applications we would like to obtain the same response every time the corresponding challenge is queried (for example to enable repeatable key-generations), but often, due to the noise, the responses are not perfectly reproducible, causing CRPs of type (c,r), (c, r'); moreover, quite likely the response bits are non-uniformly distributed, especially when the number of CRPs is very large. While fixing the latter problem is relatively easy, for example using hash functions, obtaining robustness is more involved.

To overcome both these issues it is necessary to implement additional post-processing logic. There are essentially two different solutions: Fuzzy extractors [1], that perform both error correction (using for example BCH codes) and privacy amplification (applying hash functions), and Pattern Matching Key Generators (PMKGs) [2].

Delvaux and Verbauwhede in their work describe an attack to PMGK and also propose a countermeasure to it.

Pattern Matching Key Generators – Description

At a high level we can say that this approach reverses the standard challenge-response format of a PUF.

To describe a PMKG we distinguish an Enrollment phase and a Reconstruction phase.

Enrollment. Consider a stream (Resp) of PUF response bits, corresponding to a certain number of challenges, and refer as a pattern any subset of W consecutive bits of Resp. If Resp consists of L+W – 1 bits, then we have L possible patterns.

a. Select one of these patterns at random (using an external interface) and store the index j corresponding to it. The actual corresponding response bits (Patt) are published publicly and form the Public Helper Data (Pub).

Note here that is the index j that is kept secret, and hence used to derive the secret key, and not the response bits; any index provides log_2(L) bits, assuming L=2^k, for some positive integer k.

b. Repeat the previous step H times (Rounds).

c. Concatenate the indices (j_1 || j_2 || ... || j_H) to obtain the full secret key.

Reconstruction. To recover the key, the PUF is iterated through a deterministic set of challenges, obtaining Resp'_i, i=1, ..., H, (Resp'_i can be seen as Resp_i+Noise_i). Then perform a patter matching procedure for every round. Note that Resp'_i contains some noise, so the pattern Patt'_i corresponding to the public Patt_i will be the the (only) one which satisfies d(Patt_i,Patt'_i) =t <= T, where T is a fixed and well-chosen threshold value, and d denotes the Hamming distance.

Pattern Matching Key Generators – Attack

To describe the attack the authors first model the failures of PMKG. It is very easy to see that there are two possible failures for key reconstruction: pattern misses and pattern collisions. The first occur when t > T, and the second occur if t =< T for some index j' not corresponding to the secret sequence of indices. If we denote by P_MISS and P_COLL the probability of a pattern miss and collision, respectively, it is possible to prove that:

P_FAIL= 1- (1-P_MISS)^H(1-P_COLL)^H,

where P_FAIL indicates the overall failure probability. Intuitively it is clear that pattern misses occur when T is small, whereas pattern collisions are more probable when T is large.

In a nutshell, the attack presented in the paper, and named SNAKE due to the similarities with the well-known video game, exploits malicious modifications of the public helper string Pub as follows. The idea is to replace the last (to the right) bit of Pub introducing a random bit in the first position (to the left). In this way the first unexposed bit immediately to the left of Pub is retrieved via statistical properties of the overall failure probability P_FAIL. Then it is possible repeating the same procedure moving along the PUF response string like a snake. When a consistent change in failure rate occurs, then the secret index j is revealed.

References

[1] Dodis, Yevgeniy and Ostrovsky, Rafail and Reyzin, Leonid and Smith, Adam,

Fuzzy Extractors: How to Generate Strong Keys from Biometrics and Other
Noisy Data, SIAM J. Comput. , 2008.

[2] Zdenek Sid Paral and Srinivas Devadas, Reliable and efficient PUF-based key
generation using pattern matching, HOST 2011, Proceedings of the 2011
IEEE International Symposium on Hardware-Oriented Security and
Trust (HOST), 5-6 June 2011.

Wednesday, June 10, 2015

52 Things: Number 36: Index Calculus Algorithm

This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know To Do Cryptography': a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year. We continue the mathematical attacks with a description of an index calculus attack...

What is the objective?

An index calculus attack is a method for trying to solve the discrete logarithm problem (DLP). Very briefly, it works by writing the target value as the product of powers of elements in a factor base, elements whose logarithm is already known, then extract the target value through laws of logarithms. We now proceed to explain what that means in a bit more detail.

How does it work?

The algorithm can be applied to calculating the discrete logarithm for an arbitrary element $h$ any group $G=\langle g \rangle$. We will rely on the fact that if $x^ay^bz^c=1$, then $a*\log_g(x)+b*\log_g(y)+c*\log_g(z)=\log_g(1)=0$. So, if we can find some collection of $x_i$ who's logarithms are all known values $L_i=\log_g(x_i)$ and somehow manage to write $h=x_1^{a_1}\dots x_r^{a_r}$, then we know that $\log_g(h)=a_1*L_1+\dots+a_r*L_r$. The index calculus attack exploits this, and the efficiency (or inefficiency) of the attack comes down to how fast the various stages of this can be done. For context, alongside the generic technique, we will follow an example in terms of the discrete logarithm over the group $\mathbb{Z}/p\mathbb{Z}$ with generator $g$, the most common application. Being a little lazy, we will use the terms "offline computation" and "precomputation" interchangeably to refer to work that need only be done once per group. Similarly "online" and "everytime" work corresponds to work that must be done for every DLP required.

(Precomputation, basically free) Choose a Factor Base.

The factor base is a collection of elements ${b_0=g,b_1,\dots,b_r}\in G$. How to pick them, and how many to pick, are dependant on the group we're working over and the running times of the later stages. Indeed, simply choice of $r$ generally leads to a trade-off between expensive online (small $r$) and offline (large $r$) computation. Working within our example, one would generally pick $-1$ and the first $r$ primes, since these tend to make the online calculations more efficient (see below).

(Precomputation, expensive but very parallel) Find relations between the DLPs of the Factor Base elements.

Using whatever techniques we can (generally just taking arbitrary products and hoping to get lucky!) we find equations in terms of the different factor base elements relating them to both each other. By taking logs, these translate into linear relations between their discrete logarithms. We continue searching for these until we have found $r$ independent relations, which clearly takes longer the bigger we make $r$. That said, this can easily be done in parallel by simply asking each process to search independently and then merging the result sets. Our example works in exactly this way.

(Precomputation, relatively cheap) Solve the Factor Base DLPs

From the previous step, we have a number of linear relationships between the DLs of the factor base elements. In particular, we have $r+1$ equations in $r+1$ variables (since $\log_g(g)=1$ is known a priori), and so can solve to find all their logarithms. Whilst this requires using a large matrix solver, it tends to be basically free compared to the previous and next steps, since solving linear equations is much more efficient than the almost exhaustive nature of searching for relations.

(Online, expensive but very parallel) Write $h$ as a product of factor base elements

We now try and find a value $y$ and a list $a_i$ such that $h g^y = b_1^{a_1} \dots b_r^{a_r}$. This can easily be done in parallel, since each process tries a different collection of $y$ values, stopping as soon as one of them. Once that's done, we simply take logs across each value, meaning:

$$\log_g(h) = -y + L_1a_1 + \dots + L_r a_r$$

Now, I've skimmed over a big issue in that previous paragraph: how do we find this $y$? Well, in the case of our example its not too bad. Because the factor base were all small primes, we simply try and factor $hg^y$ using traditional division-like techniques. However, in other groups this can be very difficult indeed, and computationally impractical.

A very brief conclusion
So, the Index Calculus algorithm uses the fact that taking logarithms transforms multiplications into sums to try and find the discrete logarithm of a particular point. It does this by building up a table of known results (the factor base), then finding an element related to the target that can be easily written in terms of these. As such, the algorithm is very generic, and by changing the size of the factor base $r$ one recovers a number of obvious classical attacks. However, picking a value of $r$ such that every stage of the computation can be done efficiently is generally not possible, since either the precomputation or online computation (or often both!) will be prohibitively expensive.

Friday, June 5, 2015

52 Things: Number 35: Give the rough idea of Pollard rho, Pollard "kangaroo" and parallel Pollard rho attacks on ECDLP.

This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know' to do Cryptography: a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year. In this blog post we discuss the Pollard rho, Pollard "kangaroo" and parallel Pollard rho attacks on ECDLP.

Our aim is to solve the discrete logarithm problem, h = g^xfor any cyclic finite abelian group G. Thus, assuming that we have a cyclic group G = ⟨g⟩, which has prime order p, we want to find the value of x modulo p such that h = g^x when we were also given an h ∈ G. The problem with the Baby-Step/Giant-Step method is that although its run time complexity is O(√p), it also requires O(√p) space. Hence, we are interested in replacing the large space requirement for a smaller space requirement, but maintain a time complexity of O(√p). This task can be achieved with the following algorithms. [1]

1. Pollard’s Rho Algorithm.

Let f : S → S be a random mapping between a set S and itself, n is the size of S. For a random value x₀ ∈ S we compute x_i+1 = f(x_i) for i ≥ 0. Each step x_i+1 = f(x_i) is a deterministic function of the current position x_i. The values x₀, x₁, x₂, . . . are considered as a deterministic random walk.

Since S is finite we will eventually obtain x_i = x_j thus x_i+1= f(x_i) = f(x_j) = x_j+1. Hence, the sequence x₀, x₁, x₂, . . . , will eventually become cyclic (“pho” shape: ρ). Our goal is to find a collision in a random mapping like the one above, which means to find 2 values x_i and x_j with i≠j such that x_i =x_j.

To find a collision we use Floyd’s cycle finding algorithm: Given (x₁,x₂) we compute (x₂,x₄), then (x₃,x₆) and so on, i.e. given the pair (x_i, x_2i) we compute (x_i+1,x_2i+2) = (f(x_i),f(f(x_2i))) and we stop when we find x_m = x_2m. It is m=O(√ n).

For the discrete logarithm problem we partition group S into three sets S1,S2,S3. We assume that 1 $ \in $ S2, and define the following random walk on the group G, following random walk on the group G: xi+1 =f(xi)=h·xi when xi ∈S1, xi+1 =f(xi)=x2i when xi ∈S2, xi+1 =f(xi)=g·xi when xi ∈S3. We actually keep track of (xi, αi, bi) where αi+1 = αi when xi ∈ S1, αi+1 = 2αi (modn) when xi ∈ S2, αi+1 = αi+1(modn) when xi ∈ S3, and bi+1 = bi+1 (modn) when xi ∈ S1, bi+1 = 2bi (modn) xi ∈ S2, bi+1 = bi when xi ∈ S3.

Starting with the triple (x0,α0,b0) = (1,0,0), then for all i we have logg(xi) = αi + bi logg(h) = αi + bix. Applying Floyd’s algorithm we are able to obtain a collision, thus find a value of m such that xm = x2m. This means that am + bmx = a2m + b2mx or (bm − b2m)x = a2m − am and if bm $ \neq $ b2m, we obtain x = $ \frac{a_{2m} - a_m}{b_m - b_{2m}} (mod n) $

Assuming that the sequence x0,x1,x2,... is produced by a random mapping from G to itself, then the above algorithm will find the discrete logarithm in the expected time O(√ n).

2) Pollard’s Kangaroo Method.

Pollard’s Kangaroo method is like the Rho method but it is particularly tuned to the situation where we know that the discrete logarithm lies in a certain interval x ∈ [a,...,b].

Let w = b − a be the length of the interval in which the discrete logarithm x is known to lie. We define a set S = {s0,...,sk−1} of integers in non-decreasing order and its mean m should be around N =√w. We usually choose si = 2i for 0 ≤ i < k (thus the mean of the set is m = $ \frac{2^k}{k}$) and also k ≈ $ \frac{1}{2}$ log2(w). The group is divided up to k sets Si, for i = 0, . . . , k − 1. We then define the deterministic random walk: xi+1=xi·gsj if xi∈Sj.

We compute the deterministic random walk, starting from g0 = gb, by setting gi = gi−1 · gsj for i=1,...,N. We also set c0 =b and ci+1 =ci+sj (mod q). We store gN and notice that we have computed the discrete logarithm of gN with respect to g, which is cN =logg(gN).

Now we have to compute the second deterministic random walk starting from the unknown point in the interval x. We set h0 = h = gx and compute h i+1 = hi · gs′j . We also set d0 = 0 and di+1 = di +s′j (mod q). Notice that we have logg(hi) = x + di.

Hence, if the path of the hi meets the path of the gi then hi will carry on the path of the gi. We will then be able to find a value M where hM equals our stored point gN .

Thus, we will have cN = logg(gN) = logg(hM) = x+dM, and the solution to our discrete logarithm problem is given by x = cN − dM (mod q).

If we do not get a collision then we can increase N and continue both walks in a similar manner until a collision does occur. The expected running time of this method is √w and the storage can be seen to be constant.

3) Parallel Pollard’s Rho Method.

When we use random walk based techniques for solving discrete logarithm problems we often use a parallel Pollard's version. Assuming that we are given the discrete logarithm problem h = gx in a group G of prime order q, we first decide on an easily computable function H : G → {1 , . . . , k} (k is usually around 20) and then we define a set of multipliers mi. These are produced by generating random integers ai, bi ∈ [0, . . . , q − 1] and then setting mi=gaihbi.

To start the deterministic random walk we randomly pick s0, t0 ∈ [0, . . . , q − 1] and compute g0 =gs0ht0. The deterministic random walk is then defined on the triples (gi,si,ti) where gi+1 = gi · mH(gi), si+1 = si + aH(gi) (mod q), ti+1 = ti + bH(gi) (mod q).

Hence, for every gi we record the values of si and ti such that gi =gsihti.

If we assume that we have m processors, then each processor can start a different deterministic random walk from a different starting position using the same algorithm in order to determine the next element in the walk. When two processors (or the same processor) meet an element of the group that has been seen before, then we obtain the equation gsi hti = gs′j ht′j from which for the discrete logarithm x can be solved.

We expect that after O($\sqrt{πq/2}$/m) iterations of these parallel walks, a collision will be found and the discrete logarithm problem will be solved. However, this means that each processor needs to return every element in its computed deterministic random walk to a central server which then stores all the computed elements. This is highly inefficient due to large storage requirements, namely O($\sqrt{πq/2}$).

Moreover the storage can be reduced to any required value as follows: We define a function d on the group, d : G → {0, 1} such that d(g) = 1 around 1/2t of the time. The function d is often defined by returning d(g) = 1 if a certain subset of t of the bits representing g are set to zero for example. The elements in G for which d(g) = 1 will be called distinguished.

It is only the distinguished group elements which are now transmitted back to the central server, which means that we expect the deterministic random walks to continue another 2t steps before a collision is detected between two deterministic random walks. Hence, the computing time now becomes O($\sqrt{πq/2}$/m+2t) and storage becomes O($\sqrt{πq/2}$/2t). Thus, storage can be reduced to any manageable amount, at the expense of a little extra computation.

[1] http://www.cs.bris.ac.uk/~nigel/Crypto_Book/book.ps (pages 208 - 214)