Ridiculous bug with std::random_shuffle

Filed under: programming — jlm @ 17:49

Right now I’m very irate at C++’s STL because its random_shuffle algorithm has a landmine associated with it. I was using it with a vector, which I’ll refer to as v. The call which shuffles v’s contents is random_shuffle(v.begin(), v.end(), rfunc), where rfunc is a function object which, in the case of int-indexable vectors, accepts a single int n and returns a random int in the range [0, n). What’s the obvious way to get a random int in some range? uniform_int_distribution, of course. And boom, there goes the dynamite because uniform_int_distribution(0, n) generates a random int in the range [0, n], not [0, n).

Besides the need to use uniform_int_distribution(0, n-1) being counter-intuitive, some other attributes make this bug frustrating. While random_shuffle requires rfunc’s return value to be in [0, n), it doesn’t check that it is, even when debugging is enabled — instead, it’s our old friend undefined behavior, no compiler diagnostic provided. vector bears some blame too, as it’s happy to silently munge data pointed to by an out-of-range iterator even with debugging enabled, optimization off, warnings enabled — and again, undefined behavior, no compiler diagnostic provided. If 0 is a valid value of a vector element (naturally, for me it was), then the effect of the bug when v.size() is less than v.capacity() is to get a shuffled vector in which one of its elements has been replaced by 0, and the downstream effects of that can easily be too subtle to notice (naturally, for me it was). So, you only get crashes when v.size() == v.capacity() and something in v.data corrupts and gets corrupted by whatever’s just past it in memory — which is a freakin’ rare occurrence!

How do you debug this kind of thing? Since it’s an intermittent bug which any given run of the program is highly unlikely to trigger, turn on core dumps so that when it eventually does occur, you have the dump of the program state available to debug right there. Turn on debugging so you can get the most out of your corefiles. Stick in code which lets you track what you need to track. Here, this shim function is instructive: int f() { int rand_val = uid(rand_eng); return rand_val; }. You need to remove -O to prevent calls to f() from being replaced by calls to uid(rand_eng) and use -O0 to prevent the transient variable rand_val from disappearing. (volatile probably would work also.)

Don’t forget that this is terrible API design, though. We have a library which has a generator of random ints and a consumer of random ints, and their conventions are different enough that the obvious way to combine them is buggy, yet similar enough that they’ll never trigger compiler warnings (much less errors). That’s not a good API choice, that’s a trap for API users! It’s good to have obvious ways to perform the operations API users will want to perform provided they work correctly. Since API design is hard, often you can’t do that. In those cases where you can’t make an obvious use that works, don’t introduce the obvious use in the first place — it’s far worse than having no obvious use at all, because if you make an obvious way to perform an operation, people will perform the operation that way, simply because it’s the obvious way. It should never be wrong. That there are so many ways to perform unsafe operations that are not obvious, not caught by C++ compilers, and not caught at runtime is the thing I dislike most about this programming language. Today I learned one more way to do it, and I should be well beyond that point.


The Shockwave Flash that was

Filed under: web — jlm @ 09:55

swf is dead, a closed-source project shut down by its owner (Adobe) a few years ago.
Kazerad, best known as the artist-author of the web comic Prequel Adventure, wrote a touching eulogy for it on twitter here.

[Read it “unrolled” (all 15 parts together) here.]


How is this glibc deadlock still unaddressed?

Filed under: programming — jlm @ 13:29

Why Programming is Hard, Volume CXX: the maintainers of critical pthread implementations will ignore multiple bugfixes from parallel programming experts for a longstanding deadlock bug for three years and counting.


Do not meddle with the Application Binary Interface

Filed under: programming — jlm @ 15:18

Why Programming is Hard, Volume CXIX: a comprehensive explanation by JeanHeyd Meneide of how ABI stability stops us from having nice things despite enormous amounts of high-quality work on ABI-consistency and related issues. Naturally, it’s a dispiriting read. Fortunately, half a year later there’s this follow-up on how to fix it.


twitcode #5: Extract HyperRogue achievements from log file

Filed under: programming — jlm @ 17:28

The game HyperRogue is very interesting. Its main gimmick is that it’s played on a hyperbolic plane (surface with negative curvature) instead of an Euclidean plane (surface with zero curvature) like 99.99% of 2-D games are. Playing in this kind of geometry is compellingly mind-bending. Other than the geometry involved, the main way the game differs from typical roguelikes is that the player character doesn’t gain levels or skills in the course of play — instead, the game opens up new “lands” with new environmental challenges and/or monsters with new attributes as the player demonstrates their skill by meeting the unlock requirements for each land.

The game supports “achievements”, where whenever your gameplay meets the condition of some challenge (some very easy, some nightmarishly hard, most of medium difficulty), it records this in a log file. Of course, met achievements aren’t all that the log file contains, but it’s nice enough to start each “met achievement conditions” record with “ACHIEVEMENT” and a marginally-descriptive achievement name, followed by other relevant information. Unfortunately, you can’t get a list of your fulfilled achievements merely by running  grep '^ACHIEVEMENT ' hyperrogue.log  because the game logs when you fulfill an achievement’s conditions separately for every run, not just the first run to fulfill it. So, to extract the records of each achievement’s first fulfillment, we ignore any record with an achievement name we’ve seen before and print out the line otherwise (ie, if we haven’t seen it before), which is nigh-trivial in awk [link]:

    if (!($2 in ach)) {
        ach[$2] = 1;

At 86 characters, it’s the fifth example from me of “twitcode” (programs under 140 bytes). And being in awk, it’s the fifth language I’ve twitcoded as well.


Who’s logic bombing whom?

Filed under: covid, humor, web — jlm @ 16:58

This is how we know Star Trek is fiction:
[Available vaccines are unavailable]

This kind of thing is how humans break computers in that world. Here in reality, it’s how computers break us humans.


One line of javascript to be console friendly

Filed under: web — jlm @ 08:38

From early on in styling this blog, I’ve had the sidebar on the top right, with the main blog content flowing against it, then after it ends lower in the page, below it. (You won’t notice this if your browser is very wide, as I cap the width of the main body at 50 em, but if you shrink it some you’ll see this.) Frankly, I’m surprised this is so unusual in blog and related webpage themes, which generally keep the space beneath sidebars empty, just wasting that space. (At least, when it’s not filled with ads.) This is super easy to do: in the page’s layout, have the sidebar’s markup come before the main body and give it the CSS style rule float: right. The browser will do everything from there, filling the page with the content that follows the sidebar’s markup, line wrapping when it hits the sidebar while also filling the space below it, doing a better job than most javascript-based layout renderings do.

There’s only one thing I consider a problem with this technique. In text console browsers (lynx, links, elinks, etc.), it shows the sidebar before the main content because the sidebar is in the markup earlier, which is way less friendly than if it appeared after. If I move it after the body, then it shows up better on the console, but now it’s at the bottom of the page in the graphical browsers used by all sane people, which is no good. I’ve had a soft spot for the console browsers from almost my first encounter with the web, long before I started this blog. The reasons for that is another story, but there’s no way I’m going to sacrifice the view in graphical browsers for improving the console browser experience, so I’ve kept the sidebar earlier in the markup than the main content all 15 years I’ve run this blog. And all that time I’ve felt tiny, insignificant tinges of regret for not offering a better experience for the extremely rare visitor from a console browser.

Every so often I’d think about tweaking WordPress to swap the order of the main content and the sidebar if the user-agent was a console browser, but that seemed inelegant and ugly, plus likely to be a pain with doing it in PHP, learning the appropriate WordPress internals, and likely having to carry a patch forward across WordPress versions. Or I’d think about doing it in javascript: have the sidebar be after the content in the markup so it shows up well in the console browsers that don’t execute javascript, but run a script to move it earlier in the graphical browsers that do. But then I’d have to learn javascript and DOM manipulation, which seemed like it was going to be a pain too.

Well, it turns out it is so not a pain it’s almost funny. I’m familiar with giving elements “names” with the id HTML attribute, for use in sub-page links and CSS #-references. Those names turn out to be great for getting DOM handles in javascript: just call document.getElementById("TheName"). And moving an element from where it is in the original markup to inside another element is just NewEnclosingElement.appendChild(ElementBeingMoved). Putting those together, I simply:

  • Replaced the <?php get_sidebar(); ?> in the theme’s header.php file with <div id="sidebar"></div>
  • Gave that element a float: right CSS rule
  • Added this to the theme’s footer.php file:
    <?php get_sidebar(); ?>
    <script type="text/javascript">

(WordPress names the <div> that encloses the sidebar “menu”. I picked the name “sidebar” for the <div> which “menu” gets moved into.)

And … that’s it. The blog now looks like I want it to in both lynx and firefox. I probably shouldn’t have waited 15 years to look into this!


TIL: gshadow has user lists

Filed under: linux — jlm @ 21:37

While I was chatting (well, rapid-fire emailing) with a friend who works as a system administrator, he dropped a bit of Linux trivia on me: It’s not just /etc/group which has user lists, the /etc/gshadow file also has user lists — more than /etc/group does, even! After the crypted password is a list of group administrators, then a list of shadow members. The former have the ability to change the group’s password as well as its membership using the gpasswd command. The latter can make the group be their primary group by calling newgrp without needing a password. (See man gshadow.)


Simple Scala puzzle

Filed under: programming — jlm @ 10:46

Who can guess what this tiny* Scala program does?

object O extends App {
    val x = 123456789L
    val y = 0F
    val z = x − (if (true) x else y)

(Apparently this mysterious snippet had been going around a while ago, but I only recently encountered a variation.)

Answer: Prints “−3.0”.


Short answer: The if/else causes x to get promoted to Float, which rounds it up to 123456792.

Long answer after the fold.



Unit tests are not enough

Filed under: programming — jlm @ 20:48

[I wrote this for another blog back in 2011, but it was lost to public view when that blog disappeared. It’s still valid today, though!]

In one of our datastructures, there’s an entry for a 64-bit unique resource identifier. This turned out to be a little too vague a description: Is it an 8-byte array, a 64-bit wide integer, or an instance of a 64-bit UUID class? One of our developers though one thing, and another thought another. No problem, right? The first compile by the one who chose wrong will show the type error. Except we’re using Python, so no type checking until the code runs. Still no problem, because our devs wrote unit tests, so the type error will show up on their first run, right?

Unfortunately, unit tests help not an iota in situations like this. The developer expecting UUID objects writes unit tests which consume UUID objects, and the dev expecting byte arrays writes unit tests which produce byte arrays.

So, integration tests, then? It turns out our devs did these too, verifying that their code integrated with the fancy datastructure class, but this still didn’t catch the mismatch! This is because the datastructure class doesn’t actually do anything with the resource identifier that would trigger a type check: it just accepts it as an entry in a slot when an item is inserted, and gives it out as one of the slot entries when items are looked up, and Python is happy to let an instance of any type pass through unchecked.

It’s only when we plug everything together that this trivial type mismatch shows up. So the moral: Make end-to-end tests early. You need them to turn up even problems as basic as this, and the earlier you find problems, the better.

Powered by WordPress