Tuesday, April 17, 2012

Eurocrypt 2012: A Tutorial on High Performance Computing applied to Cryptanalysis (invited talk)

Antoine Joux gave the first invited talk of the conference, on High Performance Computing for Cryptanalysis. The focus was on best practice, and he evidently spoke from experience, with much advice on lessons learnt from problematic as well as fruitful past endeavours.

The unique goals of HPC projects in the field (usually motivated by record breaking or benchmark setting) set it apart from typical programming tasks, and priorities are accordingly very different. Portability and re-usability, for example, are not important (and may even be counter-productive) because the code is generally intended for one-time use against a specific task. What is crucial is that the code be simple, easy to modify and not rely on libraries or featurism. One nice feature specific to cryptanalytic challenges is that correctness of the eventual output is generally easy to check, because the answer is known a priori.

Antoine identified four steps in applying HPC in a cryptanalytic context:
  1. Identify an algorithmic starting point.
  2. Find the requisite computing power.
  3. Program (and debug/optimise as appropriate).
  4. Run and manage the actual computation.
Possible applications include lattice reduction, collisions, index calculus and decomposition algorithms (knapsacks, for example). Toy implementations can sometimes be interesting in themselves, and sufficiently enlightening to not require full-scale extension. If a large scale HPC implementation is useful/needed, it is important to decide on a reasonable goal so as to be reasonably sure of obtaining results in the time frame you are planning to devote to the project. For example, are you aiming for a reduced size proof-of-concept or a full scale demo with cryptographic-size parameters?

The choice of computing power is generally motivated by whatever is available, and of course there are different advantages and disadvantages inherent in the various options.

Researchers with the means to source their own dedicated machines have full control over architectural choices and do not have to compete with other users for resources. However, such an approach does not scale well, and the size of the possible undertaking will eventually hit a limit.

One alternative Antoine mentioned was to get multiple computer users to run computations in the background on their machines -- so exploiting idle cycles on desktops. The total available power would potentially be huge and much more easily scaled. However, the experimenters would have no control over choice of architecture, they would eventually be constrained by limited communication bandwidth, the possible presence of 'adversary' resources would need to be mitigated against, and the code itself would need to be extremely user-friendly. This did sound like a very challenging scenario to work in (Antoine himself did not seem keen on the idea).

The third option mentioned was to apply for power on HPC resources -- so gaining access to high-end, dedicated computing power with fast communications. Challenges remain, though: one is then constrained to the existing architecture, and one has to factor in the new difficulty of job management in a multi-user context. Lastly, if physical access to a HP computer is not available, then HPC "in the cloud" could be an option.

Once appropriate resources have been identified, the researcher can progress to the task of writing the code. Antoine stressed that the goal here should be simple, low-level code, avoiding libraries and prioritising on-the-fly adaptability over re-usability or portability. Optimise where possible but not at the risk of introducing nasty surprises later down the line.

The final stage, of actually running and managing the computation, is often seen as tedious and difficult. Antoine offered many helpful tips from personal experience. Here are a few of them:
  • Scale up slowly to the intended size.
  • Expect software failure -- where phases are found to not scale well, be prepared to re-program on-the-fly.
  • Rare bugs can be hard to detect, so check intermediate data carefully.
  • Expect hardware failure -- there will always be a power-down risk, so make sure your code is restartable. Faults can damage computations, so, again, check intermediate data.
  • Remember to factor in fluctuating availability of shared resources (avoid tight schedules).
He then gave some anecdotal examples of large scale projects that he had been involved with, such as elliptic curve point counting. The algorithmic starting point was Reynald Lercier's 1997 PhD thesis [link], and involved two phases: computing modular partial information, and pasting together using collisions search. A classical match-and-sort implementation required around one month…only there was a power shut-down after 3 weeks! After which, rather than switch to a re-programmed version with built-in restart capability, they developed a new approach which used 4 CPUs and took just one night (obviously, power stability is far less of a concern in such a short time period). The technique is described in Joux's 2001 paper "Chinese and Match" [link].

No comments:

Post a Comment