Jul 16, 2013

Reading MP3 ID3 Tags in Native PHP

This week I went crazy about file formats. I tried to understand specifications of many popular formats like MP3, FLV, PDF. Its amazing to see that no matter how complex these technologies are or the algorithms they use to store media efficiently, at the lower level it is just a clever arrangement of bits that makes sense and with a bit of experimentation and hacking around MP3 format (a Hex Editor is a invaluable tool in this), I was able to read them in PHP without using any extension. The source has been put on GitHub.

Binary File Reader

The native method for reading a binary file is unpack(). The problem with it was that it can’t handle variable length chunks, and I found it tough to understand the format of packing codes. Unluckily, I realized it quite late (damn!), that I can create the reader more efficiently by using unpack() function. (Gist)

A background On ID3 Tags

Like I said, tags are nothing but just an arrangement of bytes which makes sense. As the official spec describes, the first three bytes, are fixed, which are “ID3”. Next two bytes declare version, one byte for flags and next four bytes for total length of the tags that follow. I found, no much use to the first 10 bytes, especially the flag byte is completely obscure of what its purpose is.

Next what follows is a series of frames with header and body which declare the actual content. The header has four characters for its Frame ID, followed by four bytes for size of body, two bytes for flag and next follows the body of tag. It is more clear from the picture below.

Hex Edit for MP3 File

For TCON tag, 00 00 00 0C is the size of tag body ( 12 bytes ), 48 65 represent flag bits which is described in spec and the next 12 bytes ( “Heavy Metal” ) form the body of tag. Many of such frames make up the information about the MP3 file. Some frames have further formatting in their “body” like APIC which represents the Album art.

Constructing an ID3 Reader

Once you understand the spec, creating a reader is very simple.The first step should be to read the header bytes.

$this->_FileReader = new BinaryFileReader($FileHandle, array(
    "ID3" => array(BinaryFileReader::FIXED, 3),
    "Version" => array(BinaryFileReader::FIXED, 2),
    "Flag" => array(BinaryFileReader::FIXED, 1),
    "SizeTag" => array(BinaryFileReader::FIXED, 4, BinaryFileReader::INT),
));

The constructor in ID3Tags_Reader.php, initializes a BinaryFileReader object with a map of first 10 bytes. As explained, ID3 is fixed 3-byte string followed by version, flag and total size of tag body (which is casted to an integer). Once header is read we can start reading tags.

The ReadAllTags() method defines a similar map for reading frames,

$this->_FileReader->SetMap(array(
    "FrameID" => array(BinaryFileReader::FIXED, 4),
    "Size" => array(BinaryFileReader::FIXED, 4, BinaryFileReader::INT),
    "Flag" => array(BinaryFileReader::FIXED, 2),
    "Body" => array(BinaryFileReader::SIZE_OF, "Size"),
));

“Body” uses an option to define a variable length string which depends upon “Size” (Keep in mind to type cast “Size” to integer). A while loop follows to read all tags defined in $ID3Tags array.

Reading Album Art

The Album art or Attachment Picture, in official sense, refers to a picture of albums, songs we see in our music players. The body of APIC has a special formatting described in the spec. The problem in reading was how to create a File handle from string for BinaryFileReader. While the thing could have easily been achieved by unpack(), I would not let my work get un-noticed :).

PHP provides a method by which we can create artificial streams without using files. They are so flexible that you can create them out of strings, http resource, standard input etc. To create a stream here, we can simply use “data://” like,

//Create an artificial stream from Image data
$fp = fopen('data://text/plain;base64,'.base64_encode($this->_ID3Array["APIC"]["Body"]), 'rb');

To read the image data, the map we can use is,

$fileReader = new BinaryFileReader( $fp, array(
    "TextEncoding" => array(BinaryFileReader::FIXED, 1),
    "MimeType" => array(BinaryFileReader::NULL_TERMINATED),
    "FileName" => array(BinaryFileReader::NULL_TERMINATED),
    "ContentDesc" => array(BinaryFileReader::NULL_TERMINATED),
    "BinaryData" => array(BinaryFileReader::EOF_TERMINATED)
));

MimeType, Content Description and FileName have no specific size but are just null-terminated strings and BinaryData which contains the main image content is rest of the remaining file.


Follow Me!

I write about things that I find interesting. If you're modestly geeky, chances are you'll find them too.

Subscribe to this blog via RSS Feed.

Don't have an RSS reader? Use Blogtrottr to get an email notification when I publish a new post.