Fall County

Introduction

I've been spending a lot of time lately wondering what the actual difference is between tar and zip files, like you do, and not... quite sure about the whole thing. When I say a lot of time, I do mean that I saw an xkcd comic reminding me that tar can be a pain in the ass to use, and this made me wonder why we bother anymore. So I decided to do some digging, and I found some things! These things are getting added to my personal notes, and I figured I should share them with y'all.

Also there'll be a table here with all of the differences once I get around to it

Archive Type.tar.zip.tar.gz
SizeThirdSecondFirst
Random AccessExistsGoodExists, technically
Attributes keptUnixMSDOS (native), Unix (depends)Unix
SupportLinux(Native, GUI/CLI), Windows (Native, CLI)Windows (Native, GUI/CLI), Linux (Non-native, GUI/CLI)Linux (Native, GUI/CLI), Windows (Non-native, GUI)
Biggest strengthLocal performanceModificationTransfering

Linux is mentioned because it's my daily driver.1
Windows is mentioned because of market share.
MacOS isn't mentioned because I've never used it.2

Definitions

We're gonna start with some definitions, since that'll make things more clear:

File: An individual collection of data, like a text file, word processor document, image, etc Directory: Logical management for files, think your Documents directory in your home directory. These are called folders in Windows, I think Archive: A directory that gets treated like a file, rather than as a directory. This gets muddy later on, but all definitions do.

Tape Archive file (.tar)

That stands for something holy shit.

This is the largest archive type of the ones we'll be discussing. There may be larger ones I'm not aware of. It's large because there's no compression on it by default. This won't necessarily be an issue if you're just keeping the archive locally for backup purposes, but it may be an issue if you plan on transferring that archive to a different computer, especially over the internet.

Tar does, on paper, have random file access; however, the entire archive must be read before any file can be accessed, as it does not have an index. This means the computer doesn't have a file to read that tells it where the files are, so it has to read them itself. This is fine on smaller tar files, and on systems with higher input/output (I/O) on their drives, but on some PCs this will result in a massive performance hit.

This format also retains unix file attributes, notably things like permissions, universally, whereas other formats may not.

Tar is supported by Linux out of the box, and is now supported by Windows out of the box. On Windows, however, there's no GUI support that I can find for tar files, which is going to be a drawback since most Windows users aren't familiar with the command line. From what I'm seeing, using Thunar and XArchiver there is GUI support on Linux though.

Zone Improvement Plan (.zip)

Okay it doesn't actually stand for that, at least not in computer terms. As far as I can tell it's just because of zippers.

Zip archives will be smaller in size than .tar files, which makes them easier to transfer around than .tar archives, since .tar doesn't compress the files, it just collects them.

Zip archives store MSDOS attributes by default, and can store Unix attributes with certain types of software. These are things like archive, readonly, system, and hidden.

They also allow for random access of the files, as they include an index file. This makes them better for directories you're still working with, but less effective at compression than the next contender.

Zip is also supported by most Linux distros, and is supported by Windows out of the box.

GZip Tar files

Go ahead and re-read the tar section, as it gets all of the benefits of tar, but with one crucial benefit added: Compression. Lots of it. Much more than zip can achieve.

Zip files can have up to a 50% compression ratio. GZip has an average compression ratio of 50%, but it can get up to 95% on certain files. For reference, the difference between it for this file as of the writing of this colon is: GZip (1670) Zip (1870). Only about 200 bytes, but that's an 11% difference, which can matter for things like games and movies. The bigger benefit comes in when compressing an entire tar archive, though. Which I would do for the example, but I don't have zip installed and don't wanna.

The reason is, though, zip individually compresses each file and then archives the compressed files. GZip can only compress one file, which is usually a tar archive, which means it can look for data repetition across all files in the archive and compress multiple files together. This does mean there's basically no random access (it exists, but it's not worth it), but it also means the files can get down to intensely small sizes, and are much easier to transfer.

For support, Linux supports this out of the box, Windows doesn't. At all. You can get 7zip, which does support it, but natively there's nothing.

Conclusions

Are any of these better than the others? No, they have their place.

Tar is going to be best for a local archive when storage space isn't a huge issue. It has the best performance, since it works about as fast as copying files, which works good for things like local backups.

Zip is going to be good when you want some compression, with the ability to modify files on the fly, or when you need compatibility with the widest variety of machines.

GZip tar files are going to be great for transferring many files between PCs, since it offers the highest compression ratio, especially across multiple files.



  1. Of fucking course it is, I'm sitting here talking about archive formats. Who else does this.↩︎
  2. Look I'm already talking out of my ass with Windows. I've used it, but not in about a year, and I've never thought to do half the crap I do now on Windows. Sue me.↩︎

Recent posts