Donenfeld: Random number generator enhancements for Linux 5.17 and 5.18

[Posted March 18, 2022 by jake]

Jason Donenfeld has published a lengthy look at the changes to the Linux random-number generator (RNG) for Linux 5.17 and the upcoming 5.18 kernel. It covers his efforts "to modernize both the code and the cryptography used" and also peers into the future for changes that may be coming.

random.c was introduced back in 1.3.30, steadily grew features, and was a pretty impressive driver for its time, but after some decades of tweaks, the general organization of the file, as well as some coding style aspects were showing some age. The documentation comments were also out of date in several places. That’s only natural for a driver this old, no matter how innovative it was. So a significant amount of work has gone into general code readability and maintainability, as well as updating the documentation. I consider these types of very unsexy improvements to be as important if not more than the various fancy modern cryptographic improvements. My hope is that this will encourage more people to read the code, find bugs, and help improve it. And it should make the task of academics studying and modeling the code a little bit easier.

(Log in to post comments)

Great article!

Posted Mar 18, 2022 15:46 UTC (Fri) by vstinner (subscriber, #42675) [Link]

I'm always impressed that commit messages are way longer than the actual changes in the Linux kernel commits:

* https://git.kernel.org/pub/scm/linux/kernel/git/crng/rand...
* https://git.kernel.org/pub/scm/linux/kernel/git/crng/rand...

This article is a great overview of the current Linux RNG implementation and history of the code (especially the recent history).

Great article!

Posted Mar 18, 2022 17:38 UTC (Fri) by wtarreau (subscriber, #51152) [Link]

> I'm always impressed that commit messages are way longer than the actual changes in the Linux kernel commits:

Quite frankly, this should be the norm everywhere. If a good explanation can avoid some code, better put that into the commit message and use it. At least the what, why, how must be mentioned. Personally I even encourage developers to also put mentions of alternatives that were considered and why they were dropped. If you put one week of thinking on different approaches, you can certainly devote 10 minutes summarizing your conclusions in hope that 5% of the time, it will save you another week in the future when git bisect tells you that your patch is faulty.

Great article!

Posted Mar 18, 2022 22:38 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

It's not so great because commit logs are not easy to grep or index. Ideally this kind of explanation should be in nicely structured text files.

Great article!

Posted Mar 19, 2022 1:29 UTC (Sat) by zx2c4 (subscriber, #82519) [Link]

Hopefully with the RNG you're getting the best of both worlds:

- Commit messages describe justification for _changes_. That is, a rationale for going from one state to another state. This discussion generally includes some research on the old state in order to motivate the new state, in addition to information about the new state.

- random.c's code comments and section headers now have lots of up to date information. These describe the present state and are useful as documentation, especially to people hacking on the code, since it's beside the code.

What we don't currently have but something I am working toward is a detached description of the RNG and motivations for various design decisions. This will be useful to researchers and such trying to get a good high level description of how the thing works and motivations for its various components. I expect for this aspect of the documentation to congeal as the RNG's design does as well. But hey, already what's happened in 5.17/5.18 is a pretty big improvement on the status quo.

Great article!

Posted Mar 19, 2022 6:24 UTC (Sat) by bartoc (subscriber, #124262) [Link]

True, however associating such information with the change itself (and thus _when_ the change was made) makes the historical context pretty clear.

Great article!

Posted Mar 22, 2022 9:16 UTC (Tue) by jacmet (subscriber, #19734) [Link]

> It's not so great because commit logs are not easy to grep or index

git log --grep works pretty well.

Great article!

Posted Mar 23, 2022 9:45 UTC (Wed) by MrWim (subscriber, #47432) [Link]

> It's not so great because commit logs are not easy to grep or index.

On the contrary - I find it one of the easiest places to look. I use use all of `git log --grep`, the GitHub search bar, `git gui blame`, `git log -S` (pickaxe) and the VSCode git lens depending on what I'm looking for and what environment I'm using.

The beauty of it is that it's context, context, context.

* Commits (including but not limited to the commit message) provide context for a changed line. I try to provide as many references in my commit messages as I can. This includes references to other commits, or links to bug-trackers or documentation or stack-overflow posts, etc.
* Commits also exist in their own historical context - both in terms of other commits on the branch, what was happening on other branches around the same time, and how and when the commit was merged. The beauty of this is that it happens automatically.
* This context is immediately available in your editor when looking at the code too. With git blame and associated editor integration you can see how the code you're looking at came to be like this. The "why" of some code is immediately accessible. You might argue that this information should be in the comments, and in some cases this true, but you can be a lot more verbose in a git commit message than in a code comment, and with blame *every line* has some message associated with it, with additional context a few clicks away.

> Ideally this kind of explanation should be in nicely structured text files.

I think separate documentation works best for high-level design type documentation that's unlikely to change quickly, and user facing documentation. Less so when things change quickly, or need to be cross-referenced with code.

The beauty of commit messages is that the message is attached to the code, and additional context is captured automatically. With separate documentation you need to refer to the relevant parts of code (and keep those references up to date). You also need to manually include a bunch of context so the future people can understand the documentation - much of which would be included automatically by git. It's also harder to find the relevant documentation from the code, while from any given line the commit message is just a click away. Documentation gets out of date - so do commit messages, but at least with commit messages you can know when it wasn't out of date and you can see exactly the code that it applied to.

Also: if you're writing documentation why not include it in the commit message too? Copy and paste is easy and there's no harm to the duplication. Sometimes I feel like developers are so used to applying DRY to their code that they get carried away and don't ever want to press Ctrl-C, Ctrl-V.

I can understand how people feel differently. A lot of the value of these tools only come if developers take care to make their commits atomic and provide useful information in their commit messages. Worse: you can individually start applying this discipline to a project but you won't start seeing the benefits of doing so for at least several months - it's only really helpful once you've forgotten the context in which the code you're looking at was written.

Great article!

Posted Mar 19, 2022 9:50 UTC (Sat) by dottedmag (subscriber, #18590) [Link]

There are two kinds of documentation:
- how/why this thing is changed
- how/why this thing is done this way

The question to distinguish them is "would this documentation be still necessary if the code, as it appears after this commit, was there from the beginning of the repository?"

Putting the first kind of documentation in a commit message is a no-brainer. Putting the second kind there is a great way to make it inaccessible. Cryptic two lines of code, no comments next to it? Only the most disciplined engineers will dig the history to find out why. Cryptic two lines of code, 200 lines of comment explaining why the code is here, and why it is done this way? Nobody reading the code will miss it.

Great article!

Posted Mar 19, 2022 9:51 UTC (Sat) by dottedmag (subscriber, #18590) [Link]

Oops, I meant to reply to this comment: https://lwn.net/Articles/888434/

Great article!

Posted Mar 19, 2022 10:19 UTC (Sat) by Wol (subscriber, #4433) [Link]

Great comment, though ...

Documentation

Posted Mar 19, 2022 13:05 UTC (Sat) by tialaramex (subscriber, #21167) [Link]

Comments however are the last resort, because we're writing code which inherently _has_ meaning. We should try to capture as much as possible of our meaning in that code, where a maintenance programmer is obliged to change it if they change the intended meaning later, whereas the comments can become out-of-sync with the code either through oversight or misunderstanding.

Modern optimising compilers can assist greatly here by allowing us to more often express what we actually meant, rather than focusing on what code will deliver the best performance. And languages can help too, this is one of the opportunities for Rust once that's an option.

C doesn't even have iterators, you can't express "are all the doodads in this bag of doodads sparkly?" except by explicitly enumerating them, checking if they're sparkly and then discarding the enumeration, the reader must either imagine you actually wanted the enumeration (you didn't) or notice that you only use it as a way to iterate through the doodads and discard it once you determine if they're all sparkly. If you have a mix of any, all and none predicates to evaluate, the shortcuts needed to correctly provide good performance are different on each and a reader must discern what's going on there or the resulting loops are distracting because they're each different.

Documentation

Posted Mar 19, 2022 13:37 UTC (Sat) by dottedmag (subscriber, #18590) [Link]

> Comments however are the last resort, because we're writing code which inherently _has_ meaning.

No, it does not.

> We should try to capture as much as possible of our meaning in that code, where a maintenance programmer is obliged to change it if they change the intended meaning later, whereas the comments can become out-of-sync with the code either through oversight or misunderstanding.

Yes, they can. However, they are hard to ignore if without a comment one is unable to understand the code.

> Modern optimising compilers can assist greatly here by allowing us to more often express what we actually meant,

They are useless.

> this is one of the opportunities for Rust once that's an option.

It is completely useless.

> C doesn't even have iterators, you can't express "are all the doodads in this bag of doodads sparkly?" except by explicitly enumerating them

Iterators don't help.

Let me explain what kinds of detailed comments are useful (picked at random from a codebase):

- «Calling this external API simultaneously using the same API key causes it to lock up, so there is a mutex»
- «Detaching a disk from a VM before Linux has booted causes a desynchronization of disk names inside the VM and in the libvirt, so wait for an r/w activity on the disk as a telltale of mounted root filesystem before detaching it»
- «This certificate is used in a closed-loop system, so CA/B requirements do not apply to it»
- «Here's the reverse-engineered description of undocumented IPC»
- «mount(2) contains a lot of historical baggage, and that baggage, unfortunately, got leaked to the external system interface, so the following code mimics the mount(8)»

These are facts about other systems.

Documentation

Posted Mar 19, 2022 13:47 UTC (Sat) by jreiser (subscriber, #11027) [Link]

> C doesn't even have iterators, you can't express "are all the doodads in this bag of doodads sparkly?" except by ...

Although the language itself does not have the explicit constructs that you mention, all is not lost. The C language allows the programmer to mark sections with motivating comments (/* Iterate over the bag, check for sparkly, discard the dullards */), use braces to delimit blocks or compound statements that implement the strategy, and even use carefully-chosen subroutines. Good code still is possible, and it's not that much more work.

Documentation

Posted Mar 19, 2022 15:05 UTC (Sat) by rgmoore (✭ supporter ✭, #75) [Link]

Comments however are the last resort, because we're writing code which inherently _has_ meaning.

Not at all. Code, or at least clearly written code, is excellent at explaining what it does, but it is normally silent on why it was done that way. You can give hints through suggestive variable names and the like, but ultimately explaining your design decisions needs to be done through comments, commit messages, or other forms of documentation.

Documentation

Posted Mar 19, 2022 16:11 UTC (Sat) by kschendel (subscriber, #20465) [Link]

> Comments however are the last resort, because we're writing code which inherently _has_ meaning.

Sorry, that's one of the Big Lies of programming. It didn't work with COBOL, it doesn't work with C, and it certainly won't work even with the Great New Pie-in-the-Sky language. I've heard variations on this statement for over 45 years and it's never been true. As long as we have computers that operate step by step rather than intuiting what we mean, we'll need comments in and on the code.

Documentation

Posted Mar 20, 2022 0:52 UTC (Sun) by wblew (subscriber, #39088) [Link]

> Comments however are the last resort, because we're writing code which inherently _has_ meaning.

The code cannot answer two critical questions about itself...

A) WHY the code does its thing.

B) What was intended, as contrasted with what the code actually does.

Without knowing A it's difficult to change the code with confidence.

Without knowing B it's difficult to know what is a defect and what is intentional behaviour.

Documentation

Posted Mar 20, 2022 21:11 UTC (Sun) by rlhamil (guest, #6472) [Link]

"Comments however are the last resort, because we're writing code which inherently _has_ meaning."

In a perfect world, sure. But humans notoriously bring their own degree of understanding and context to the use of any language, even languages intended for programming; and no nominal evaluation of their qualifications will always be effective in in assuring they're prepared to deal with what to them is obscure. Until all programmers are replaced by AI's coding everything else, we're stuck with supporting functional behavior despite imperfection. Maybe even then, if you're paranoid. :-)

An (old-timer) example:

while (*dst++ = *src++)
;

For someone familiar with C idiom (esp. if they learned from e.g. K&R), no big deal. Otherwise, although that's correct and concise for copying null-terminated strings (or 0 terminated arrays of any integer-like variable, so e.g. *src and *dist could refer to wchar_t rather than char) to a destination presumed to have sufficient space, it's got a LOT going on in a short space. It causes brain bleeds for some trying to learn C (pointers being a common stumbling block, and while this is well-defined, pre-increment combinations may not be; someone might be tempted to generalize wrongly). Using subscripts may be more readable, but I suspect this is among the more efficient ways to do it. It could perhaps be made more readable as

while ( (*dst++ = *src++) != '\0')
;

and perhaps still generate the same code if the compiler is smart enough, give or take the additional constant '\0' stored in the program.

Things like that are more or less idioms that one simply has to learn and recognize, even if one can also aspire to analyze them in painful detail. Even

/* idiom: see K&R */

would be some help there.

Moreover, some languages or projects have conventions for embedding documentation in comments, so it's all (ideally) kept current together. While I don't know of a specific example, that might include not only user or API documentation, but maintainer documentation too, with perhaps either separately retrievable from the source file. That might or might not be over and above inline comments.

OTOH, comments like

/* OMG, deadline is f-ing killing me! */

probably don't contribute much, give or take historical context and a possible warning of quality issues. And your future AI replacement will probably waste cycles dealing with humorous comments. :-)

Documentation

Posted Mar 21, 2022 0:54 UTC (Mon) by Wol (subscriber, #4433) [Link]

Perfect example of idions which some cultures would recognise instantly, others would struggle with ...

When I learnt C I went on a course, and we were asked to write a snippet of code to count how many 3's there were in in our hand of cards. I wrote mine, and when I read it out for the instructor to write up on the board, he just could NOT hear what I was saying - I had to spell it out letter by letter, because his mind kept changing what I was saying.

struct card {
char suitt;
int value;
}[13];

for (ii=0, count=0; ii++; ii<13)
count += (card[ii].value == 3);

Okay, my C is rusty and what I've written probably wouldn't compile, but that last line just felt so "wrong" to the instructor that until I forced him to write it character by character he just couldn't even hear me right ...

But to me that was just the completely obvious way to do it, because my main programming language was DataBasic, and pretty much EVERY program you write in DataBasic would use that idiom many times. It was just the standard way of counting how many elements are in a dynamic array

NO.ELEMENTS = COUNT( ARRAY, DELIMITER) + (ARRAY NE "")

Cheers,
Wol

Great article!

Posted Mar 21, 2022 7:36 UTC (Mon) by eduperez (guest, #11232) [Link]

In other worlds, those messages would probably go into a "pull request", not the individual changes themselves.

Great article!

Posted Mar 21, 2022 11:11 UTC (Mon) by taladar (subscriber, #68407) [Link]

If there is anything that is even harder to find later than the commit message of the original commit it would be the comments in the pull request/merge request.

Great article!

Posted Mar 21, 2022 11:12 UTC (Mon) by kleptog (subscriber, #1183) [Link]

That irritates me. I know some projects where everything is put in the MR and the commit messages are of the form "modified file X". If you do a "git pull" you get a repository with code but no explanation of what's happening. You need to visit gitlab for that. The fact that merges are done using fast-forward is probably partially to blame.

I'd much prefer people wrote useful commit messages, and then the MRs copy from that. Then at least the repo by itself contains useful info.

Great article!

Posted Mar 22, 2022 9:00 UTC (Tue) by eduperez (guest, #11232) [Link]

I think there should be a balance: a PR where the gross idea of the change is explained, then a comment on each commit explaining the fine details.

Great article!

Posted Mar 24, 2022 14:07 UTC (Thu) by jezuch (subscriber, #52988) [Link]

> I'd much prefer people wrote useful commit messages, and then the MRs copy from that. Then at least the repo by itself contains useful info.

That's exactly what I do. Also, if the MR/PR consists of a single commit, both Github and Gitlab will copy its message to the MR/PR description. It's very convenient, as it saves me from doing the copying myself :)