In this here entry I will rant a bit about some of the stupidity that
ID3v2 tags seem to have forced upon the world. First, though, a primer
on ID3 tags is probably in order.
ID3 tags
ID3v1 tags have a fixed length and layout (i.e. fixed-length
records), and they are always tacked onto the end of MP3 files. This is
both a blessing and a curse. It is a blessing because ID3-aware
applications have a very easy way of getting at them, just seek to the
last 128 bytes of a MP3 file and see if the first three bytes there are
the string TAG
. It is a curse because the fields are fixed
length (e.g. 30 bytes for the song title), so if you have a song title
that is longer than the length allowed its tail will be chopped off.
Naturally, this could not continue. And, ID3v2 tags where introduced.
ID3v2 is a very different beast from ID3v1; an ID3v2 tag is variable
length and consists of a number of variable length parts (e.g. song
title). Again, this is both a blessing and a curse. Now, though, the
reasons are reversed. It is a blessing that there is no practical limit
to the length of for example song titles. But, with ID3v2 tags it is
impossible to tack them onto the end of MP3 files, since you cannot know
their length in advance (and so cannot find them by seeking to a fixed
length from the end of MP3 files). So, ID3v2 tags are put at the
beginning of MP3 files.
The Problem
Now, we come to the problem. It seems that (at least) one MP3 encoder
writes ID3v2 tags with a whole lot of trailing zero bytes. This is not a
problem for most MP3 players, because they quietly skip over junk in MP3
files. Alas, it is a problem for me, since I would like to
write simple and efficient Python code to parse MP3 files. Besides,
leaving junk in my data files goes against my sense of aesthetics.
The reason for putting zero bytes after ID3v2 tags is not clear, but
I do have a few theories. Currently, I lean towards the belif that the
zero bytes are there so minor changes can be made to the tags without
rewriting the entire MP3 file. It’s the closest I have to an
explanation, there is just one problem with it: It is a dumb
reason. Really, really dumb, even.
Post Scriptum
I decided to solve The Problem by side-stepping it. Instead
of having only one MP3 parser I decided to have to different ones. One
simple and efficient parser for good MP3 files. One (slightly less)
simple and inefficient parser for bad MP3 files. The bad-MP3-file
parser I only use for repairing MP3 files.
See my Python MP3 Tools for
the MP3 parsers and the MP3-file repairing script.