MP3 songs, music tracks and audio files contains built-in enclosed metadata container called ID3 which allows information such as title, artist, album, track number, and other information to be stored in the MP3 file itself. The ID3 tags allow software based media player such as Windows Media Player, iTunes, WinAmp, etc. and hardware-based media player such as iPod, Zune, and other MP3 player to recognize and display the music details on the for files management or show on LCD screen of the gadget.

The problem or issue is that the MP3 ID3 tags which does not been added or entered on local computers can have different character set code page for character encoding, such as English, Traditional Chinese, Simplified Chinese, Korean, Japanese, Arabic, Thai, Cyrillic, Greek, Hebrew, Celtic, Baltic, Latin, Polish and etc. Different character encoding used on text entered in ID3 tags and the media player, be it hardware or software (software-based media player normally uses system locale set in operating system) causes some characters and symbols cannot be shown or displayed properly or unreadable as system does not have the characters.

When the music tracks, songs or audio clips in MP3 format embedded with tags encoded with traditional charsets, especially for non-Western European languages, the characters may not be displayed properly, gibberish, unintelligible, appear as question marks or squares, or simply as weird garbage character. The wrong character encoding issue problem also happens to playlist which contains MP3 tracks with traditional character encoding.

The issue can be solved if the ID3 tags is written and stored in Unicode (UTF-8) character encoding format, which contains most if not all characters for most languages in the world, and supported by all operating system for software-based media player and hardware-based media player gadget. In some player, user can force the encoding for specific code page, but then it’s impossible to display tags of several international languages at the same time if files are so encoded.

Users can re-type and re-enter the ID3 tags of MP3 in Unicode charset encoding manually for each and every MP3s. However, it’s much easier to be able to let software utility automatically convert and change the character encoding of ID3 tags to Unicode (UTF-8). Here are a few software utilities that can do the conversion.

Chacon (foo_chacon) for foobar2000

Chacon (an acronym for charset convertor) is a simple tool for fixing tags by converting them between different character sets. User can directly access the functionality from the context menu and for any number of tracks at once, by right clicking and select Tagging -> Fix Metadata Charset…. It’s similar to “Override charset” option in foo_infobox, which is now incompatible with new version of foobar2000. The component can be generally used to fix ID3v1 tags or cue sheets saved in a code page different from that of your system, as well as to perform some more complex restoration of files mangled by programs which write incompatible or incorrect tags.

Foo_Chacon is a component for foobar2000 audio player, and thus requires foobar2000 to be installed.

Foo Chacon

Download foo Chacon (version 3): foo_chacon-v3.zip

To install Foo_Chacon component, open the foobar2000 preferences dialog, go to the Components page and then click the “Install…” button, or simply drag a component archive to the list.

ID3iconv

ID3iconv is a Java command line tool to convert ID3 tags in MP3 files from any machine encoding to Unicode. It convert both ID3v1 tags and ID3v2 tags to Unicode-encoded ID3v2 (v2.3 or v2.4), which supports multi-byte encodings such as GBK or Big5. As it’s Java based, it supports Windows, Mac OS X, Linux and most other platforms.

Download the ID3iconv (version 0.2.1): id3iconv-0.2.1.jar

Run the following command to convert MP3 files:

java -jar <ID3iconv .jar binary> -e <source MP3 character encoding> <MP3 file names>

The -e parameter switch is used to specify the character encoding of source MP3 in Java encoding names, as listed here. The character encoding must be correct and reflect the original charset, or else the conversion may return erroneous result. If no encoding is specified, the default OS encoding will be used.

Unicode Rewriter

Essentially a graphical user interface (GUI) for ID3iconv, with batch conversion support.

Download Unicode Rewriter (version 0.1): UnicodeRewriter-Installer-01.jar

Unicode Rewriter

To install Unicode Rewriter in Windows 7 or Vista, open an elevated Command Prompt window as administrator, and execute “start UnicodeRewriter-Installer-01.jar”. Other versions of Windows or Mac OS X, just double click on the .jar file. For Unix and Linux, execute “java -jar UnicodeRewriter-Install.jar”.

MP3 Tag to Unicode Converter (source)

A small command line batch script that uses ID3iconv in interactive mode to scan the specified path for MP3 files, and convert them accordingly.

Download id3iconv.zip

MP3 Tag to Unicode Converter

Run id3inconv.bat for interactive mode. Alternatively, or specify the path name to MP3 files as parameter directly. User can modify id3iconv.cmd to specify the path to java executable and ANSI encoding of MP3 files if the encoding is different from default system settings. Again, the correct original character encoding of MP3 songs must be specified so that conversion to Unicode is valid.

Tag2Utf Cyrillic MP3-Tags Decoder

A Python based open source tool for encoding tags of MP3 files in the Cyrillic charsets (CP1251, KOI8-R) to Unicode. Supports Linux and Unix system with Python installed. Although by default only Cyrillic character sets are supported, user can modify the script to allow it to convert MP3 in other character encoding.

Download tag2utf (version 0.16): tag2utf-0.16.py

Run the following command to convert MP3 ID3 tag to Unicode:

tag2utf.py <MP3 directory>

All MP3s inside the directory, including sub-directory will have the ID3 tags converted.

Related Posts