The blagotube home of Sune Kirkeby. Here are rambling rants and some bits of code.

06. Feb
2004

Bastard zero bytes last seen trailing ID3v2 tags

In this here entry I will rant a bit about some of the stupidity that ID3v2 tags seem to have forced upon the world. First, though, a primer on ID3 tags is probably in order.

ID3 tags

ID3v1 tags have a fixed length and layout (i.e. fixed-length records), and they are always tacked onto the end of MP3 files. This is both a blessing and a curse. It is a blessing because ID3-aware applications have a very easy way of getting at them, just seek to the last 128 bytes of a MP3 file and see if the first three bytes there are the string TAG. It is a curse because the fields are fixed length (e.g. 30 bytes for the song title), so if you have a song title that is longer than the length allowed its tail will be chopped off.

Naturally, this could not continue. And, ID3v2 tags where introduced. ID3v2 is a very different beast from ID3v1; an ID3v2 tag is variable length and consists of a number of variable length parts (e.g. song title). Again, this is both a blessing and a curse. Now, though, the reasons are reversed. It is a blessing that there is no practical limit to the length of for example song titles. But, with ID3v2 tags it is impossible to tack them onto the end of MP3 files, since you cannot know their length in advance (and so cannot find them by seeking to a fixed length from the end of MP3 files). So, ID3v2 tags are put at the beginning of MP3 files.

The Problem

Now, we come to the problem. It seems that (at least) one MP3 encoder writes ID3v2 tags with a whole lot of trailing zero bytes. This is not a problem for most MP3 players, because they quietly skip over junk in MP3 files. Alas, it is a problem for me, since I would like to write simple and efficient Python code to parse MP3 files. Besides, leaving junk in my data files goes against my sense of aesthetics.

The reason for putting zero bytes after ID3v2 tags is not clear, but I do have a few theories. Currently, I lean towards the belif that the zero bytes are there so minor changes can be made to the tags without rewriting the entire MP3 file. It’s the closest I have to an explanation, there is just one problem with it: It is a dumb reason. Really, really dumb, even.

Post Scriptum

I decided to solve The Problem by side-stepping it. Instead of having only one MP3 parser I decided to have to different ones. One simple and efficient parser for good MP3 files. One (slightly less) simple and inefficient parser for bad MP3 files. The bad-MP3-file parser I only use for repairing MP3 files.

See my Python MP3 Tools for the MP3 parsers and the MP3-file repairing script.

This post was written by Sune Kirkeby on 2004-02-06, and claimed to be mostly about rambling.