Wednesday, April 12, 2017

Is Your Banking App Secure?

Last week I was in Malta for Financial Cryptography and Data Security 2017 to present my recent work on securing the PKCS#11 cryptographic API.

One talk that stood out for me was by researchers from the University of Birmingham, who looked for vulnerabilities in the mobile apps provided by major UK banks.

Sadly, they found major weaknesses in apps from 5 of the 15 banks they investigated.

Several apps use certificate pinning, where the app hard-codes a certificate from a trusted CA and only accepts public keys that are signed by the pinned certificate.
This is good practice, as an attacker can add their own certificate to the phone's trust store, but it won't be accepted by the app.
However, two Android apps (for Natwest and Co-op) accepted any public key signed by the pinned certificate, without checking the domain name!
So the attack works as follows:

  1. Purchase a certificate for a domain you own from the trusted CA
  2. The app will accept your public key with this certificate
  3. Man-in-the-middle all the encrypted traffic between the user and their bank.

Curiously, the authors note: "Co-op [...] hired two penetration testing companies to test their apps, both of which had missed this vulnerability". It seems odd that such an obvious mistake wasn't picked up in testing.

The group also found that several banks - Santander, First Trust and Allied Irish - served adverts to their app users over unencrypted HTTP, meaning an attacker could spoof these ads and mount a phishing scam, perhaps by displaying a fake 'security warning' and directing users to re-enter their account details on a malicious page. It was pointed out in the talk that we're much more likely to 'feel safe' within an app (and hence trust all the content we see) than, say, visiting a webpage using a laptop, so this kind of in-app phishing scam could be very effective.

There are even more exploits described in the paper.

It was refreshing to hear that the vulnerable banks responded well to the disclosures made by the Birmingham group and patched their apps as a result. But I'm a little baffled that these basic errors were ever made in such security critical applications.

Wednesday, March 29, 2017

PKC 2017: Kenny Paterson accepting bets on breaking TLS 1.3

The member of the TLS 1.3 working group is willing to bet for a beer that the 0-RTT handshake of TLS 1.3 will get broken in the first two years.

In his invited talk, Kenny managed to fill a whole hour on the history of SSL/TLS without even mentioning symmetric cryptography beyond keywords, thus staying within the topic of the conference. Despite all versions of SSL being broken to at least some degree, the standardised TLS became the most import security protocol on the Internet.

The core part of TLS is the handshake protocol, which establishes the choice of ciphers and the session key. Kenny highlighted the high complexity stemming from the many choices (e.g., using a dedicated key exchange protocol or not) and the possible interaction with other protocols in TLS. Together with further weaknesses of the specification, this created the space for the many attacks we have seen. On the upside, these attacks express an increased attention by academics, which comes together with an increased attention by the society as whole. Both have laid the ground for improvements in both the deployment and future versions of TLS. For example, the support of forward secrecy has increased from 12 percent to 86 according to SSL pulse.

Turning to concrete attacks, most important in the area of PKC is the Bleichenbacher attack published already at Crypto 1998 (a human born then would a considered a full adult at the conference venue now). Essentially, it exploits that RSA with the padding used in TLS is not CCA-secure, and it recovers the session key after roughly $2^{20}$ interactions with a server. Nevertheless, the TLS 1.0 specification published shortly after Bleichenbacher's publication incorporates the problematic padding (recommending mitigation measures), and later versions retain it for compatibility. The DROWN shows the danger of this by exploiting the fact that many servers still offer SSLv2 (about 8% of Alexa top 200k) and that it is common to use the same key for several protocol versions. An attacker can recover the session key of a TLS session by replaying a part of it in an SSLv2 session that uses the same key.

On a more positive note, Kenny presented the upcoming TLS 1.3, which is under development since 2014. It addresses a lot of the weaknesses of previous versions by involving academics from an early stage and doing away with a lot of the complexity (including reducing options and removing ciphers). It furthermore aims to decrease latency of the handshake by allowing the parties to send encrypted data as early as possible, reducing the round trip time to one in many cases. The goal of low latency has also led to the inclusion of QUIC, which provides zero round trip time, that is, the client can send data already in the first message when resuming a session. However, QUIC is not fully forward-secure and therefore confined to a separate API. Nevertheless, Kenny predicts that the sole availability will be too tempting for developers, hence the bet offered.

Concluding, he sees three major shifts in TLS this far: from RSA to elliptic-curve Diffie-Hellman, to Curve25519, and away from SHA-1 in certificates. A fourth shift might happen with the introduction of post-quantum algorithms such as Google's deployment of New Hope. Less optimistically, he expects that implementation vulnerabilities will continue to come up.

Update: An earlier version of this post mentioned the non-existing Curve255199 instead of Curve25519, and it attributed New Hope to Google.

Monday, March 27, 2017

Tools for proofs

Security proof  for even simple cryptographic systems are dangerous and ugly beasts. Luckily, they are only rarely seen: they are usually safely kept in the confines of ``future full-versions'' of papers, or only appear in cartoon-ish form, generically labelled as ... ``proof sketch". 

The following two quotes frame the problem in less metaphorical terms. 

``In our opinion, many proofs in cryptography have become essentially  unverifiable. Our field may be approaching a crisis of rigor".

                                              Bellare and Rogaway (cca 2004)

``Do we have a problem with cryptographic proofs? Yes, we

do [...] We generate more proofs than we carefully verify
(and as a consequence some of our published proofs are
                                                                     Halevi (cca 2005)

Solutions developed by cryptographers e.g. compositional reasoning and the game-hopping technique, help to structure proofs and reduce their complexity and therefore alleviate to some extent the pain of having to develop rigorous proofs. Yet, more often than not proofs are still sketchy and shady.

There is help that comes from the programming languages community which has a long experience with developing tools for proving that programs work correctly and...cryptographic systems are just programs. Recent progress,  e.g. automated verification of parts of TLS, fully verified security proofs of implementation masking schemes to defeat leakage, is impressive and exciting.  More work is under way. 

If you want to learn more about how can you get someone else to do the proofs for you or, more realistically, learn about what existent tools can currently do, what they could do in the future, and discuss what is needed and which way to go, then you should attend the 
Workshop on Models and Tools for Security Analysis and Proofs
 -- Paris, 29th of April; co-located with EuroSnP and Eurocrypt --

which the Bristol Crypto group helps organize. The workshop features as speakers some of the most prominent researchers that are contributing to this direction. You can register for the workshop HERE. Early registration ends March 31st!

But wait...there is more. If you want to explore this area beyond what a one-day workshop allows, then you should consider attending the

Nancy, France, July 10th - 13th --

See you all in Paris and/or Nancy!

Tuesday, February 21, 2017

Homomorphic Encryption API Software Library

The Homomorphic Encryption Application Programming Interface (HE-API) software library is an open source software library being developed as part of the Homomorphic Encryption Applications and Technology (HEAT) project, and is available here. The main purpose of this software library is to provide a common easy-to-use interface for various existing Somewhat Homomorphic Encryption (SHE) libraries. Limited support for fixed-point arithmetic is also provided by this library. Note that the HE-API library is still a work in progress.

Fully Homomorphic Encryption (FHE) is a cryptographic primitive that allows meaningful manipulation of ciphertexts. In spite of several recent advances, FHE remains out of practical reach. Hence a reasonable restriction to make is to limit the set of evaluated circuits to a specified subclass, usually determined by the multiplicative depth of the circuit. Such encryption schemes are called as SHE schemes.  Various libraries such as HElib, SEAL, FV-NFLlib, HElib-MP, etc., are already available that implement these SHE schemes.

The purpose of this HE-API software library is to provide a common, generic, easy-to-use interface for various existing libraries that implement SHE schemes. The SHE libraries that are currently integrated in the HE-API library are HElib and FV-NFLlib. It may be noted that the FV-NFLlib library is itself an outcome of the HEAT project. At a high-level, the HE-API software library abstracts out the technicalities present in the underlying SHE libraries. For instance, the HElib library implements the BGV SHE scheme, while the FV-NFLlib implements the FV SHE scheme. Needless to say, the syntax for various classes and routines in the individual libraries will be different, though the underlying semantics are very similar. The HE-API library integrates the underlying SHE libraries under a single interface, thereby shielding the user from syntactic differences. Another feature of the HE-API library is that it contains minimal, yet complete, set of routines to perform homomorphic computations. The design of this library is motivated by the ease of use for non-experts.

Supported Data Types
The following application data types are supported by the HE-API software library. 
  • Boolean
  • Unsigned long integers
  • GMP's arbitrary precision integers class: mpz_class
  • Polynomials with coefficients of type: unsigned long integers or mpz_class
  • Vectors of : unsigned long integers or mpz_class
  • Fixed-point numbers
Note that all the data types and routines described above may not be currently supported by every underlying SHE library.

Friday, January 13, 2017

RWC 2017 - Secure MPC at Google

This talk was given by Ben Kreuter and its focus was on the apparent disparity between what we research in academia versus what is required in the real world, specifically in the field of multi-party computation (MPC). MPC is the idea of allowing multiple parties to compute some function on their combined input without any party revealing anything about their input to the other parties (other than what can be learnt from the output alone).

While significant work has been done on making MPC efficient in practice (for example, the work of Yehuda Lindell et al. on high-throughput MPC which was presented by Lindell in the preceding talk), the focus tends to be on generic protocols (e.g. general logic circuits) with strong security guarantees (e.g. malicious security), which invariably leads to large computational overhead. In practice, we usually require only specific protocols, which can therefore be optimised, and comparatively weak security guarantees.

In the real world, network cost is the salient factor, rather than the speed of the protocol, since the parties who are involved in a computation often have to use networks (such as the Internet) which are being used by many other people at the same time and cannot make the best use of the network's full capabilities. The MPC at Google is about computation amongst, for example, mobile phones, laptops and servers; this introduces issues like battery constraints and the possibility of the computation not completing; these considerations, firmly grounded in the real world, are important when developing MPC techniques in research.

Business applications

A large portion of Google's revenue is generated by advertising: the tech giant, well-known for its aptitude for accurately determining users' desired search results even when queries are expressed ineloquently, specialises in creating personalised adverts to its wide spectrum of users. The efficacy of an advert is generally measured by the proportion of viewers of it who later become customers. Clearly this can be done by businesses comparing their database of customers' transactions with Google's databases of who has been shown which adverts. This, however, would be an invasion of privacy: instead, Google and the business can do MPC: more specifically, a private set intersection protocol.

In a private set intersection protocol, the parties involved compute how large the intersection is amongst the sets input by each party, or even some function on those elements in the intersection. So if the business and Google compute a private set intersection protocol on their data, they can determine how well the advertising went.

Roughly speaking, the MPC Google does in the real world is as follows: Google has a set $\{g_1,g_2,...,g_n\}$ of field elements which encodes a set of people who have been shown an advert for a certain product, and a business has a set $\{b_1,b_2,...,b_m\}$ of field elements which encodes a set of people who have been sold the product in question; Google raises each of its elements to a power $G$ and sends the set $\{g_1^G,g_2^G,...,g_n^G\}$ to the business. The business does the same with its elements for some exponent $B$ to get $\{b_1^B,b_2^B,...,b_m^B\}$, encrypts a set of binary vectors under Paillier encryption (which is additively homomorphic), one corresponding to each element in its set, encoding some other property of the sales (like the amount paid), and also computes the set $\{g_1^{GB},g_2^{GB},...,g_n^{GB}\}$. The business sends Google the set of pairs $\{(b_1^B,P(v_1)),(b_2^B,P(v_2)),...,(b_m^B,P(v_m))\}$ along with $\{g_1^{GB},g_2^{GB},...,g_n^{GB}\}$, and Google computes $\{b_1^{GB},b_2^{GB},...,b_m^{GB}\}$ and adds together all encrypted vectors $P(v_i)$ for which there exists some $j$ such that $g_i^{GB}=b_j^{GB}$. It sends this ciphertext back to the business, which decrypts and interprets the result.

This protocol is very simple, and it is only passively secure (in which players are assumed to execute the protocol faithfully but will possibly try to learn things by inspecting their communication transcripts). An interesting, perhaps somewhat orthogonal concern, to how we approach research from an academic point of view is that it is important that we can convey the security and efficiency of our protocols to lawyers, managers and software engineers who will eventually be sanctioning, authorising or implementing the protocols. "The lawyers are interesting because you can show them a proof, and two plus two equals four is a negotiable statement here... managers usually trust your expertise...and software engineers are the worst because they already assume [the protocol] is impossible."

An alternative solution using garbled circuits was explored in the recent past, but it turned out that their use required some subtle assumptions regarding the computation and communication which would have made the protocol impractical.

Future work would involve getting a (not too much more expensive) maliciously secure protocol and developing the use of the homomorphic encryption to allow different functions to be computed on the data in the intersection.

Consumer applications

The Android keyboard app by Google, Gboard, logs what a user types so that it can guess words for auto-completing in the future. This data could be used for training machine learning models, and merging results from many local models would enable the formation of guessing algorithms that work well for everyone. However, to do this, the server would need to receive a set large dataset of words typed by a user from each phone so that this processing could be done. Clearly there is an issue of privacy here; moreover, there is also potentially a differential privacy issue.

This is clearly a good situation in which to use MPC. Each party masks their data using a basic additive secret-sharing scheme: if each party has a vector to input, for every coordinate, every pair of parties agrees on some random field element, one subtracts and one adds this to that coordinate of their vector. When the parties send this to Google, the masks will therefore cancel when added together.

In practice,they use a PRG and perform a key exchange (in which one key is given to each pair of parties, for every possible pair) at the beginning to achieve the same effect but with much smaller communication overhead. They also have a trick for dealing with device failures (which is important given the application).

This talk provided helpful and relevant insight into the the importance of matching what we research with what we require in the real world, which is, after all, one of the main reasons for having conferences such as Real World Crypto. Many of the talks are available to watch online here, and I would highly recommend doing so if interested.

Thursday, January 12, 2017

RWC 2017 - Is Password Insecurity Inevitable?

Fresh back from an enlightening trip across the pond, I wanted to write about one of my favourite talks, all about password (in)security, from this year's Real World Cryptography conference.

As we know:
  1. Passwords protect everything.
  2. Passwords are terrible.
But happily, Hugo Krawczyk from IBM Research spoke about some great new work to resolve these two seemingly incompatible statements. There were a lot of details in the talk that I'll have to miss out here (slides are available online). In particular, I'm going to focus on 'Part I: Take the burden of choosing and memorising passwords off humans'.

The basic idea - this isn't new - is to have the user memorise a single master password that they use to access a password store. Then the password store derives unique pseudorandom passwords for each service the user wants to access (Google, Facebook, etc.) The problem with this solution is that the password store becomes a single point of failure: if it is compromised, then an offline dictionary attack to find the master password will compromise all of the user's accounts at once.

Krawczyk et al. suggest an improvement: SPHINX, which amusingly stands for "a password Store that Perfectly Hides from Itself (No eXaggeration)". The first idea is for the password store to not keep hold of (even a hash of) the master password - instead it has an independent secret key $k$, and any time the user wants to log in to a service $S$, they send the master password $pwd$ to the store, the store computes a PRF $PRF(k, pwd | S)$ and this will be sent to $S$ as the user's password for $S$. This means that if the store is compromised, the master password and the account passwords can't be learned unless the user communicates with the store. So this works well if the store is in local, offline hardware, where the user is unlikely to use the store after it is compromised by an attacker.

However, the authors go further and replace the PRF with an oblivious PRF. This means the store computes an "encrypted" version of $PRF(k, pwd | S)$ from an "encrypted" $pwd|S$, so doesn't learn the plaintext values of the master password or the service password. In practice this can be achieved by the user (i.e. the user's machine) hashing the string $pwd | S$ into an element $g$ of a Diffie-Hellman group, then computing $h = g^r$, where $r$ is a fresh, random exponent, and sending $h$ to the password store. The store's secret key is an exponent $a$, so it computes $h^a$ and sends this back to the user. The user removes the blinding exponent $r$ (i.e. computes $(h^a)^{r^{-1}} = g^a$) and the result is the unique password for $S$. Now even when the password store is compromised and even if the user communicates with the store, the master password and the account passwords can't be learned.

In principle an attacker could recover all the account passwords by compromising both the password store and a service $S$, learning the secret key $a$ and the service password $g^a$, computing $g = H(pwd|S)$ and perfoming an offline dictionary attack to find $pwd|S$. Then for any other service $S'$, the password can be computed via $H(pwd|S')^a$. But as long as $S$ follows good practice and only stores a hash $H'(g^a)$ of the service password, this attack fails: an offline dictionary attack to recover $g^a$ is unfeasible as it's essentially a random group element.

There are no particularly expensive computations involved in using SPHINX, the communication between the user and SPHINX does not need to be secure (so it could be online somewhere) and the store will work regardless of what password protocol is used by the service, so it's extremely flexible. SPHINX therefore strikes me as both useful and practical, which is surely the definition of Real World Cryptography.

Monday, January 9, 2017

RWC 2017 - Post-quantum cryptography in the real-world

A new year takes off and, along with it, thousands of resolutions are formulated. Although I am not the right person to talk about them (my diet will begin next Monday), I wish to discuss a resolution that the cryptographic community as a whole has set for itself in this 2017. Because that's what people do at Real World Crypto (RWC): they talk about new threads, topics could be worth exploring during the new year, directions for their researches and interests. This year, for the first time in RWC, post-quantum cryptography (PQC) was given an entire session, clear sign that time is changing and the moment has come to bring the discussion to the real world. The message is clear: even if quantum computers are not popping up in tomorrow's newspapers, we can't postpone any longer.

A very simple reason for this was given by Rene Peralta, of the NIST PQC team, during the overture of the session: standardisation takes time, up to seven years if we start right now, and full transition takes even longer. I found Rene's presentation to be neat and direct: our public-key cryptography fails against quantum computers and our symmetric one needs some (non-drastic) modifications. The resolution is to "start thinking about it this year, possibly by November 30th, 2017". However, a question arises quite naturally: are we ready?

The other three talks of the session tried to answer in the affirmative. Among the several PQC proposals that are around in theoretical papers, two made their ways into RWC: the well-stablished lattice-based cryptography and the new-born isogeny-based cryptography, which nevertheless carries the pride and sympathy of ECC.

Lattices and funny names: NewHope and Frodo and Crystals
Lattice-based cryptography has three representatives in the run for PQC schemes. Valeria Nikolaenko showed two: the first one is called NewHope and is a key agreement protocol based on the hardness of Ring-LWE. The latter is a problem very favourable to applications because it combines sound theoretical security (worst-case to average-case reduction) to fast implementations thanks to specific choices of parameters which allow for speed-ups in the computations: NewHope turns out to be even faster than ECC and RSA, but at the price of a larger communication. However, there are some concerns on the security of LWE when the ring structured is added. Thus, Frodo ("take off the ring") is designed to achieve the same goal using only standard LWE. The drawback is a degradation in performance, since the tricks hinted above cannot be used anymore and keys are generally bigger.

The third lattice-based scheme was presented by Tancrede Lepoint and is a suite called Crystals. This is based on yet another kind of lattices: module lattices, for which it is also known a worst-case to average-case reduction. These are less structured lattices (hence possibly calming down the detractors of ring structure) in which similar implementation speed-ups are possible: the timing is indeed comparable to NewHope's, while the communication is improved.

"Make elliptic curves great again"
Michael Naehrig presented a new proposal for PQC: do you remember curves with plenty of small subgroups where to easily solve the discrete logarithm problem? Now they come in handy again: all the subgroups (of order 2 and 3) are considered to be nodes of a graph, whose edges are the isogenies (a.k.a. bijetive homorphisms between curves). In this new context, given two curves in the graph, it is difficult to come up with the isogeny linking the two. However, such a new approach doesn't really stand against other solutions: keys are small but performance is not a pro (so to speak).