Thursday, January 23, 2014

Random Number Generation, Revisited

On the last day of the workshop Yevgeniy Dodis gave a fast-paced talk on his recent CCS paper "Security analysis of pseudo-random number generators with input: /dev/random is not robust" [http://www.cs.nyu.edu/~dodis/ps/rng.pdf]. It had something for everyone, from a complicated security model to brightly coloured charts showing implementation performance data.

Classically a PRNG is modelled as a function that takes a small secret seed of good randomness and expands it into more randomness. However finding this initial core of randomness is hard, so PRNGs in the real world take in extra input from whatever unpredictable sources they have access to. In practice (in Linux) this data comes from the timing of disk accesses, mouse movements, keyboard presses, and system interrupts [http://eprint.iacr.org/2006/086.pdf].

In 2005 Barak and Halevi [https://eprint.iacr.org/2005/029.pdfformally modelled the idea of a PRNG with input and came up with the security property of robustness, capturing the property that the PRNG can recover from internal state compromise. Dodis et al. strengthen this notion, requiring that it do so even if fresh entropy is accumulated slowly.

The adversary is powerful, and has control over the input as long as it gives an accurate bound on the entropy of the data stream it provides. Understanding of the security model is hampered by the bizarre choice to refer to public parameters as a seed, while the value S plays the role of the seed in a traditional PRNG. But in the real world no one cares about the choice of notation in your security model, so on to the practice.

The potential problems uncovered in the Linux PRNG that prevent it achieving robustness are an entropy estimator given the hugely difficult task of keeping track of how much entropy is in the internal state, and a mixing function that folds in new entropy in a linear way. The entropy estimator values irregular events highly, so that a stream of regular events with unpredictable data will be considered to have low entropy, while irregular events with predictable data are considered to have high entropy. Dodis et al. use this observation to fool the PRNG into outputting bits when it has insufficient entropy in its internal state.

This attack relies on the PRNG ignoring fresh entropy when it estimates it already has enough, which the maintainer for the Linux PRNG states is not how it behaves [https://news.ycombinator.com/item?id=6550256]. However no one is saying this is a realistic attack, and a more valid criticism Dodis made is that the Linux PRNG is very complicated. 

To this end he presented a simple construction meeting the strong notion of robustness. It makes use of a traditional PRNG and some field arithmetic, and looks clean and modular, while the pretty charts show it's competitive with the Linux PRNG. The construction echoes Barak and Halevi, and further back Ferguson and Schneier's Fortuna [https://www.schneier.com/book-ce.html] in ditching the entropy estimator. The takeaway point is that entropy estimation is hard, so let's just not do it.



1 comment:

  1. I slightly disagree with the conclusion of this article :
    In my opinion Dodis cleverly showed that we don't need entropy estimation to prove a PRNG design secure. We can merely assume statements like "as long as the environnement provides me with so much entropy during this amount of time I know I'll recover"
    It is a beautiful result and I'm looking forward having the time to read the details of the proof but from a practical point of view it could only bring a cleaner design to the linux kernel rng (which is already not nothing) because somehow we'll still need to be sure that the above statement is true which to my knowledge can only be done through entropy estimation.

    ReplyDelete