2010/04/26

Order matters even when it doesn't

Revision control diffs are most readable when they aren't noisy. Operations that disturb the order of many lines in the file create noise which makes it hard to read the interesting change. YAML specifies mappings (hashes, to us perl-types), that are unordered associations of keys to values. Even though YAML doesn't put an ordering on those, sometimes we'd like to pretend that it does, so as to preserve the order when we load a file, edit the data, then dump it back to the file.

At work we store a YAML document in Subversion, which describes a lot of details about IPsec tunnels. In an ideal world this would be the initial source of the information. The world, as you may have observed, is not yet ideal, so this file is in fact back-scraped from information in the actual config files, to keep it up to date. Naturally that's done in Perl.

The YAML file stores a big mapping, each entry itself being a record-like mapping, containing details in named keys. This causes great trouble for our load/edit/dump script, because YAML doesn't specify an ordering in mapping keys. They'll be dumped in "no particular order".
This wouldn't normally be a problem, except that because it's stored in Subversion, a commit changing one line of actual detail might suddenly produce hundreds of lines of false diff, because of reordered keys.

To solve this, I had to apply Much Evil Hackery. The YAML Perl module, it turns out, has a data structure tied to a hash, which remembers the order of keys. By subclassing YAML::Loader and replacing its method to read a mapping into a hash ref, we can force it to use this structure instead. This alteration is transparent to the perl code inbetween, it just sees a normal hash. However, YAML::Dumper sees the ordering and preserves it when it writes out.

The upshot: Load/edit/dump of trees of mappings in YAML preserves ordering, allowing cleaner commits into revision control.

This has been suggested as a wishlist bug against YAML; see also https://rt.cpan.org/Ticket/Display.html?id=56741

4 comments:

  1. I'm using YAML::Tiny for something that sounds VERY similar. I think it sorts hash keys alphabetically by default, which means that I only get slightly larger diffs if I add an entry manually in a "wrong" place. This ordering may be an implementation detail, though.

    ReplyDelete
  2. Oh, you can make the YAML module itself sort the keys alphabetically, that's simple enough. Getting -a- defined order is fine, if the program is generating YAML output, to be repeatable. The problem I had was that I was already given an existing, hand-written file, whose keys were in an order that seemed sensible at the time, to a human reader. Preserving that arbitrary order is the fun part.

    ReplyDelete
  3. The "YAML Tiny" subset syntax does not supported ordered hashes.

    ReplyDelete
  4. I'm not talking about ordered hashes in the output YAML (omap, et.al.). I'm talking about having control over the order in which the normal keys of a normal hash are written to the output. So I can force the output to be

    OurFriends:
    peer: 1.2.3.4
    psk: s3kr1t
    domains: 192.168.0.1/24-10.1.0.2/24

    and know that those three keys, peer, psk and domains, will always come in that order, simply because that was the order they were loaded in when I checked out the file from SVN in the first place.

    The idea is that

    $ svn up
    $ ./update-the-conf.pl vpns.yaml
    $ svn ci -m "Changed stuff"

    Will not disturb the order of lines in the SVN-controlled file.

    ReplyDelete