jlm-blog
~jlm

29-Dec-2016

A puzzle about C’s stdio

Filed under: programming — jlm @ 18:06

I found the C puzzles webpage by Gowri Kumar to be a very interesting collection of oddities of the C language and some of its basic libraries. If you work with C for fun or profit, I encourage you to go and give them a try. I found very few of them to produce behavior I hadn’t expected, which could be a symptom of overfamiliarity with C. I did find a few surprises though, which I felt warranted further investigation. (more…)

28-May-2016

twitcode #3: new mail in mbox

Filed under: programming — jlm @ 09:12

Once upon a time, people’s interactions with computers (those few people who got to interact with computers directly) was mostly through a teletype: a combination of a keyboard where they could type instructions to the computer and a printer where it gave the responses back to them. This model is tenaciously clung to by a handful of still-active projects such as gdb, but the bulk of its use nowadays is from command shells (bash, zsh, cmd.exe) because command-response interaction is much easier to specify, record, automate, examine, modify, and perform remotely in a teletype-style than a GUI-style.
(more…)

24-Nov-2015

twitcode #2: stream filter to decode MIME

Filed under: programming — jlm @ 12:12

Messing around with some mail handling scripts, I was surprised I didn’t find any good ways to decode MIME as a stream filter. Ten minutes later, I have 13 lines of Perl which do it in 201 characters in my normal non-terse style. It’s great for normal use, but a tiny bit of golfing fits it in a tweet’s 140-character limit:

$ cat mime_decode.pl
#!/usr/bin/perl -w
use strict; use utf8; use MIME::WordDecoder; 
binmode(STDOUT, ":utf8");
while (<>) { print mime_to_perl_string($_); }
$ wc mime_decode.pl
  4  16 137 mime_decode.pl

Good thing there was already a method which does all the real work…

23-Dec-2012

Addressing the fragile base class problem

Filed under: programming — jlm @ 21:47

I’ve been thinking about the fragile base class problem lately. (Yes, I know it’s almost Christmas. My mind works mysteriously.) I started thinking by analogy to APIs, which the interface a superclass gives a subclass in fact is, even if it’s not called that. So, the superclass’s API changes, breaking the subclass, just like a regular API’s change can break a client. How do we deal with this with regular APIs? If we are to make a compatibility-breaking change (which introducing any member into a superclass potentially is), we version the API so that a client requesting version 1 semantics gets them while only clients written against the newer semantics will request version 2. We could do the same kind of thing with class inheritance if we mark everything with revision numbers, which we reference when inheriting.

class base@2 {
    void start@1();
    void stop@1();
    void idle@2();
};

class child@1 extends base@1 {
    void idle@1();
    void park@1();
};

Here’s our classic case of a fragile base class. child subclassed base and defined the new method idle(), then later base was extended with its own method idle(). Normally, this would cause a problem — the new stop() implementation might call idle() perhaps, and child’s idle() won’t be written with overriding a then-nonexistent base::idle() in mind. But with these revision markings, we say that child only overrides methods marked as being in revision 1 of base. So, when stop() calls idle(), it gets base::idle, not child::idle, and when park() calls idle(), the call resolution goes the other way.

The problem I see with this though, is that when going to an indirect superclass, it can be unclear which revision that should be.

class grandparent@3 {
    void method@2();
};

class parent@2 extends grandparent@2;

class child@1 extends parent@1 {
    void method@1();
};

Uh-oh. Should child’s method() override grandparent’s? If parent@1 extended grandparent@2, then yes. But if it extended grandparent@1, then no. So do we need to list the parent class revisions of every revision of the child class? I’d hope there’d be a better way. Perhaps we’d be relying on an IDE to handle the revision numbers for us, keeping them updated is just a dumb task, so in that case the IDE could maintain the manifest of parent revisions too.

6-Oct-2011

Better SelectableChannel registration in Java NIO

Filed under: programming — jlm @ 12:01

Java NIO’s Selector class is surprisingly difficult to use with multiple threads. Everyone that tries it encounters mysterious blocking, much of which is due to it sharing a lock with SelectableChannel.register. So, if you happen to try to register a channel in a thread other than the selector thread, it blocks that thread until the select is done. Boo.

So, this is a NIO rite of passage of sorts, finding this misfeature and then looking up how to work around it. The usual answer is to keep a ConcurrentQueue of pending registrations, and have your select loop process that queue between select calls. Uggggleeee. It occurred to me that using a synchronization lock, we can do better.

To register a channel and get the SelectionKey:

  synchronized(registerLock) {
      selector.wakeup();
      key = channel.register(selector, operations, attachment);
  }

And in the select loop:

  // before
  synchronized(registerLock) {}
  // between
  numEvents = selector.select(timeout);
  // after

If the loop is before or after when our registration block takes the registerLock, we’re fine, as having the registerLock prevents the loop from reaching select until we’ve registered and released the lock. If the loop is inside select(), then the wakeup() will cause it to exit select, and it won’t re-enter because we hold the registerLock, so we’re fine.

The tricky case is when the loop has the registerLock or is between releasing the registerLock and entering select(). In these cases, the registration block takes the registerLock and races with the loop over select() and wakeup(). Fortunately, the NIO designers anticipated that programmers would have a desire to ensure a Selector wasn’t selecting, even if the wakeup was called in the window between checking it was okay to enter select and actually entering it. Selector.select() returns immediately if wakeup() had been called after that Selector’s prior select(). So, our race doesn’t matter, the select() always exits, and we’re safe.

This is so much simpler than building up a queue of registrations and processing them in the select loop, and we get the SelectionKey right away, I wonder if I’m missing something. Why is the textbook technique to use a ConcurrentQueue, instead of a synchronization lock like this?

8-Feb-2011

How to shoot yourself in the foot with Python

Filed under: programming — jlm @ 21:40

Accidentally compare a character and an integer.
$ python -c 'print "\0" > 1024'
True

Wait, what? Any character is larger than any integer? Why are chars and ints intercomparable then? Shouldn’t I get either a type error or a meaningful comparison? This seems like it’s guaranteed to be wrong!

4-Mar-2010

twitcode: automatic AFS Kerberos ticket renewals

Filed under: programming — jlm @ 17:29

If you’re a user of AFS with Kerberos, you’ve no doubt been annoyed at your ticket expiring and having to run aklog to get a new one. You may have tried to script running aklog automatically periodically, but been stymied that aklog won’t refresh an about-to-expire ticket. So, you have to scrape the output of klist for the expiration time and wait until then — easy enough. So simple that my script to do so, nicely indented and with declared integer variables for now and then weighed in at a paltry 353 bytes.

So, I figured if I inlined everything, I could fit it in a twitter post. And lo, it does:

while sleep $((`date -d "$(klist -5 | tail -1 | awk '{ print $3 " " $4 }')" +%s` - `date +%s`)); do aklog; done

That’s 112 characters, including unnecessary spaces for readability and the trailing newline, leaving plenty of room for extras you might think up.

Now, if only I could fit all this commentary in a twitter post…

4-Oct-2009

That’s one way to ensure a language never goes mainstream

Filed under: programming — jlm @ 09:09

Quick quiz: How do you find the arguments to your Scheme program?
  a) argv
  b) *argv*
  c) command-line
  d) program-arguments

Answer? It depends on whether you’re using Guile, SCM, umb-scheme, or mzscheme! That’s right, it’s impossible to write a Scheme program that takes arguments and is portable across implementations. Winnar.


Update: In 2013, option (c) was standardized.

3-Apr-2008

Regex breakpoints in gdb

Filed under: programming — jlm @ 12:49

gdb has this awesome feature that nobody seems to have heard of (including myself before today). You can set breakpoints by regex!

“help rbreak”

16-Dec-2006

The C aliasing rules

Filed under: programming — jlm @ 11:49

A lot of people are confused by the pointer aliasing rules introduced by C99. (Sometimes called “strict aliasing”.) First, what is aliasing? When two pointers refer to the same location, they’re called “aliases”.
void fcn(int *p) { int *q = p; ... }
Here p and q are aliases. The general rule is that pointers to the same type are allowed to alias, and pointers to different types are not. (Differences like signed/unsigned and const/non-const aren’t considered significant here– they’re all in the same “aliasing class”.)
void fcn(float *p) { int *q = (int *) p; ... }
This is illegal, because p and q are aliases, but one points to float while the other points to int. In practical terms, the optimizer will reorder accesses to *p and *q across each other, or optimize out writing/reading the value back into/from memory, because they’re in different alias classes, and you’ll get mysterious bugs.
There are two exceptions to this rule: void * and char *. Pointers of these two types are allowed to alias any kind of pointer. void * can be assigned from and to any pointer without a cast, but you’re only allowed to assign it to a type which reflects the original pointer (or char *).
void intfcn(void *p) { int *q = p; ... }
void floatfcn(void *p) { float *q = p; ... }
void fcn(void) {
  int i;
  intfcn(&i);  /*  OK: int * -> void * -> int *  */
  floatfnc(&i);  /*  Bad: int * -> void * -> float *  */
}

This style is used a lot in callbacks, where it’ll look something like:
void callsoon(void (*fcn)(void *), void *arg);  /* Call fcn(arg) in one second */
void fcn(void) {
  static int i = 1;
  static float f = 0.1;
  callsoon(intfcn, &i);
  callsoon(floatfcn, &f);
}

The compiler is still assuming your int * and float * pointers won’t alias here: It’s up to you to ensure that the void * which got a float * doesn’t get turned into an int * and vice-versa. The void * is telling the compiler “I can ensure non-aliasing on my own, without you checking”. You should not use it to silence aliasing warnings, because you get those warnings in situations where the compiler is going to re-order accesses, and the void * gives it free license to do so silently!
What do you do when you really need to alias across types? First, don’t do a straight pointer typecast or go through a void * typecast for the reasons above:
void fcn(float *p) { int *q = (void *)p; ... }
void fcn(float f) { int i = *(int *)&f; ... }
void fcn(float f) { int i = *(int *)(void *)&f; ... }  /* All illegal aliasing */

Another common but illegal technique is to go through a union:
union { int i; float f; } x; x.f = f; i = x.i;  /* Illegal: unions can only be read from the last type assigned to them */
So, what do you do? One of the most common use cases is for transferring byte representations, and memcpy() works fine for this (any modern compiler will optimize it into stores).
void fcn(float *p) { int i; memcpy(&i, p, sizeof(i)); ... }  /* Safe if sizeof(int) == sizeof(float) */
void fcn(float f) { int i; memcpy(&i, &f, sizeof(i)); ... }  /* Ditto */

void fcn(float *p) { int *q; memcpy(&q, &p, sizeof(q)); ... }  /* Illegal: just a fancy way of writing q = (int *)p */
If you need fine-grained access, this is where char * comes in. It’s allowed to alias any other pointer, so the compiler can’t re-order pointer accesses across char * accesses unless it can ensure non-aliasing by other means.
void fcn(float *p) { char *q = (char *)p; ... }  /* Safe */
Note that just going though a char * on your way to an illegal alias doesn’t make it legal:
void fcn(float *p) {
  char *q = (char *)p;  /* q now validly aliases p */
  int *r = (int *)q;  /* r now validly aliases q, but p and r are still in different alias classes and *p and *r can be re-ordered across each other */
  ...
}

void fcn(float *p) { int *q = (int *)(char *)p; ... }  /* Illegal aliasing of p and q, same as q = (int *)p */
char * is safe, as long as you do your references through it, as chars (bytes), but just having a char * there doesn’t make non-char * references legal.

Powered by WordPress