This turned out to be quite a good idea. Implementing cols2chars() and chars2cols() in XS instead of Perl makes them at least 10 times faster. I tested it on four strings; two ASCII and two Unicode; a long and a short of each:
In fact, some cases it turns out to be 24 times faster.
I haven't looked into too much detail on why, but I suspect a large amount of the reason is to do with the way the XS functions primarily walk along the internal UTF-8 representation of the strings, counting bytes, characters, and columns as they go, and returning the appropriate count(s) when the required. The pureperl implementation doesn't have direct access to the byte offsets, so only has character numbers to work to. The frequent character-to-byte or byte-to-character conversions at all the boundaries between the functions result in multiple UTF-8 byte skip counting steps along the string each time a function is entered or left, generally slowing it down.
As to the original test failure, it turned out to be entirely unrelated lack of locale support in the platform's libc. The XS implementations fail there in the same way. But having implemented the above improvements, I decided to leave them in anyway.
XS faster than Pure Perl; who'd have thought it?