The play's the thing wherein I'll catch the conscience of the king. -- Shakespeare, Hamlet II.ii

Ross Codes! Sorting Human-Readable Numbers

I’ve been running linux at home for a few years now. One of the things I like best about it is that things tend to be built up from lots of little command line component programs instead of big GUI programs. This may seem like it makes it harder to use, but that’s only true for things you only plan on doing once. If I want to, say, resize the 500 pictures I took of my little boy over the weekend (He is darned cute), I can do it with some big GUI tool where I load each picture, click resize, move some sliders, hit OK, click Save Aa, type in a new file name. Five hundred times. Or I can write this:
for x in *.jpg; do convert -geometry 1280x1024 "$x" output/"$x"; done
Having a rich command line available to me lets me do operations on large sets of data in batches, and that’s a good thing because that’s what computers are good at.
But that’s a bit of a tangent. When I am working in linux, I often find myself dealing with big numbers. File sizes. Free memory. Free disk space.
Because I rip all my DVDs to the hard drive, I’m very concerned about free disk space. So I’ll run “df“:

Filesystem 1K-blocks Used Available Use% Mounted on
torchwood:/mnt/store0 4326436544 3654545536 671891008 85% /mnt/store0
saxon:/mnt/store1 2130562560 1073968640 1056593920 51% /mnt/store1
saxon:/mnt/store2 2145245184 467011584 1678233600 22% /mnt/store2
badwolf:/mnt/store3 5768575488 4445833216 1322742272 78% /mnt/store3


But those numbers start to get blurry after a while. Fortunately, df has an option that makes its output “human readable”, “-h”:

Filesystem 1K-blocks Used Available Use% Mounted on
torchwood:/mnt/store0 4.1T 3.5T 641G 85% /mnt/store0
saxon:/mnt/store1 2.0T 1.1T 1008G 51% /mnt/store1
saxon:/mnt/store2 2.0T 446G 1.6T 22% /mnt/store2
badwolf:/mnt/store3 5.4T 4.2T 1.3T 78% /mnt/store3


A lot easier to read. Several of the standard linux commands have a “-h” option — ls, du, free has a similar “-m” option.
The disadvantage to using the human readable numbers flag is sorting. The standard command for sorting output, sort, has a flag (-n) that will make it handle numbers correctly. But if the numbers have been mangled into ugly human-readable form, this breaks, and suddenly 1G sorts below 10k.
So I wrote this quick-and-dirty little perl script which sorts the lines in a document, properly ordering numbers which have been converted into “human readable” format in the style done by df and du.
In case anyone finds it handy, This is hsort.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.