jlm-blog
~jlm

14-Jun-2024

Ridiculous bug with std::random_shuffle

Filed under: programming — jlm @ 17:49

Right now I’m very irate at C++’s STL because its random_shuffle algorithm has a landmine associated with it. I was using it with a vector, which I’ll refer to as v. The call which shuffles v’s contents is random_shuffle(v.begin(), v.end(), rfunc), where rfunc is a function object which, in the case of int-indexable vectors, accepts a single int n and returns a random int in the range [0, n). What’s the obvious way to get a random int in some range? uniform_int_distribution, of course. And boom, there goes the dynamite because uniform_int_distribution(0, n) generates a random int in the range [0, n], not [0, n).

Besides the need to use uniform_int_distribution(0, n-1) being counter-intuitive, some other attributes make this bug frustrating. While random_shuffle requires rfunc’s return value to be in [0, n), it doesn’t check that it is, even when debugging is enabled — instead, it’s our old friend undefined behavior, no compiler diagnostic provided. vector bears some blame too, as it’s happy to silently munge data pointed to by an out-of-range iterator even with debugging enabled, optimization off, warnings enabled — and again, undefined behavior, no compiler diagnostic provided. If 0 is a valid value of a vector element (naturally, for me it was), then the effect of the bug when v.size() is less than v.capacity() is to get a shuffled vector in which one of its elements has been replaced by 0, and the downstream effects of that can easily be too subtle to notice (naturally, for me it was). So, you only get crashes when v.size() == v.capacity() and something in v.data corrupts and gets corrupted by whatever’s just past it in memory — which is a freakin’ rare occurrence!

How do you debug this kind of thing? Since it’s an intermittent bug which any given run of the program is highly unlikely to trigger, turn on core dumps so that when it eventually does occur, you have the dump of the program state available to debug right there. Turn on debugging so you can get the most out of your corefiles. Stick in code which lets you track what you need to track. Here, this shim function is instructive: int f() { int rand_val = uid(rand_eng); return rand_val; }. You need to remove -O to prevent calls to f() from being replaced by calls to uid(rand_eng) and use -O0 to prevent the transient variable rand_val from disappearing. (volatile probably would work also.)

Don’t forget that this is terrible API design, though. We have a library which has a generator of random ints and a consumer of random ints, and their conventions are different enough that the obvious way to combine them is buggy, yet similar enough that they’ll never trigger compiler warnings (much less errors). That’s not a good API choice, that’s a trap for API users! It’s good to have obvious ways to perform the operations API users will want to perform provided they work correctly. Since API design is hard, often you can’t do that. In those cases where you can’t make an obvious use that works, don’t introduce the obvious use in the first place — it’s far worse than having no obvious use at all, because if you make an obvious way to perform an operation, people will perform the operation that way, simply because it’s the obvious way. It should never be wrong. That there are so many ways to perform unsafe operations that are not obvious, not caught by C++ compilers, and not caught at runtime is the thing I dislike most about this programming language. Today I learned one more way to do it, and I should be well beyond that point.

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a comment

Powered by WordPress