Yesterday my friend
Marcelo Nery
and I found a bug in the split
command of Mac OS. At first, I was
surprised (we don’t expect to find bugs in core tools like grep, ls,
split, wc…, right?) and almost expected to find the same bug in the
Linux version of split. At least in the split of Ubuntu distribution,
that was not the case. The bug is presented only in the Mac OS
version.
Consider the file zero.log
created with the following Common Lisp code
(actually for the rest of the post you don’t need to understand the
code):
This file content could be inspected with hexdump command:
That is, the file has three 0 bytes in the first line right after the
B
letter and before the C
letter. Now we want to split this file
one line per file.
The Linux version of split works as expected, it splits the file keeping the zero bytes unchanged.
Moreover, the sum of bytes of the x??
files is equal the number of
bytes in the zero.log
file, 13 bytes.
Nevertheless, the Mac OS version of split produces an unexpected
output. The letter F
is merged with the begining of the first line
althouth it is in the second line of the zero.log
file. Besides
that, the zero bytes causes the Mac OS split to ignore the rest of the
first line of zero.log
causing a lost of data. The sum of the bytes
of the x??
files in Mac OS is only 6 bytes.
I reported the bug to Apple using the Mac OS Feedback form.