« Help

Unicode filenames in ZIP format

UTF-8 is a standard character-encoding method for storing Unicode, which is developed to represent various languages.

The initial version of ZIP format, which was made in the 1980s, does not support UTF-8 because Unicode and UTF-8 were developed much later (in the 1990s).

However, ZIP became the standard archive format, and Unicode support was required; so, two ways were introduced to handle UTF-8 strings on ZIP files.

One method is storing filenames as UTF-8, and the other is storing UTF-8 filenames in the extra field. Bandizip supports both of them.



Use Unicode filenames in Zip files(UTF-8)

When you use this setting, Bandizip will stores filenames as Unicode. This method is a standard method defined in APPNOTE.

https://support.pkware.com/display/PKZIP/APPNOTE

However, some non-mainstream archivers cannot recognize this format so that the filename will look broken.

Store Unicode filenames in an extra header field of Zip files(UTF-8)

When you use this setting, Bandizip stores filenames as MBCS(old way), and stores Unicode filenames in the extra header field of Zip format, and this method is also defined in APPNOTE as "Info-ZIP Unicode Path Extra Field."

This method uses the extra field of ZIP format so that the Zip file can be tens or thousands of bytes bigger than before, but it is safer than the former.

All mainstream archivers, such as Winrar, 7z, and Winzip, supports this method.

This screenshot shows the difference between turning the option on or off when sending a ZIP file compressed on a Korean OS to a Japanese OS user.

7zip



Use Unicode filenames in tar/tgz files(UTF-8)

TAR and TGZ formats are widely used archive formats on Unix OSes.

If you turn on this option, you can extract TAR/TGZ files on Unix without filename issues because Unix OSes use the UTF-8 file system.