Yet another neglected technology blog!: Parsing tricky whitespace with awk.

Tuesday, September 4, 2012

Parsing tricky whitespace with awk.

I spent a bunch of time trying to make awk properly parse some tricky whitespace in a text file. Essentially, I have a backup log file with a list of files transferred. Given the following example output:

create d 755 0/0 512 .ccache
create d 755 0/0 512 .ccache/0
create d 755 0/0 512 .ccache/0/0
same 644 0/0 6558 .ccache/0/0/1ebefd822077669fa42316da42e2c4-11870.manifest
same 644 0/0 8251 .ccache/0/0/5ae5f481490e100d6d85b693aee8df-3158.manifest
same 644 0/0 8407 .ccache/0/0/70b8ebd6f2cb5085c0f828e9145216-5033.manifest
same 644 0/0 359 .ccache/0/0/7de2817532c5d1365a70d8dec4e378-30356.manifest

I want to grab the file size from the 4th column, so that I can sort it and find large files. The problem is, if I use a simple awk statement like

awk "{print $4}"

the file size is either the fourth or fifth column, depending on whether the first (logical) column has a space in it. I tried accounting for this with more complex awk, like so

awk 'BEGIN {FS="[ ]{2,}"} { print $4 }'

but to no avail. Admittedly, my awk-fu is weak. I did find a solution, rather a colleague came up with one.

sed "s/create d/created/" | awk '{print $4 }'

A bit outside of the box, but clever enough that I want to remember it.

Yet another neglected technology blog!

Tuesday, September 4, 2012

Parsing tricky whitespace with awk.

No comments:

Post a Comment

Followers

Blog Archive

About Me