One way of managing disk space is to create an archive. An archive is a set of files that are packaged into a single, larger file. An
archive can be composed of a few files, one or more directories, or an entire directory tree. Archives are useful for making backup copies of your data. For example, you can store an archive[1] on a different computer or on removable media such as magnetic tape. An archive is easy to manage because you treat it as a single file. In addition, compressing an archive saves more space than compressing the same files individually. You may be familiar with archives in a Windows or Macintosh environment. Programs such as WinZip or pkzip create archives that end with a .zip extension. You might hear these archives referred to as zip files.
On UNIX, you create archives by using the tar command, which is short for tape archive. tar was designed for archiving data to tape. You also can use tar to archive data to a file, which is often called a tar file. Tar files typically end with a .tar extension. It is not required, but this convention lets people identify the file as an archive. Zip files are automatically created in a compressed format, but tar files are not. If you want to compress a tar file, you must run the compress command separately. You can use the tar command to create an archive, to list the file names in an archive, or to extract[2] files from an archive.
The next three lessons describe these tasks.
How "File Archives" are used in Unix
File archives in Unix are used to consolidate multiple files and directories into a single file for easier management, storage, and transfer. The primary utility for creating archives in Unix is the `tar` command, and compressed archives often combine `tar` with compression utilities like `gzip`, `bzip2`, or `xz`.
Here’s how file archives are used in Unix:
Backup and Restore
Purpose: To create backups of files and directories.
Usage:
Create an archive:
tar -cvf backup.tar /path/to/directory
Extract from an archive:
tar -xvf backup.tar
Compression and Decompression
Purpose: Reduce the size of an archive for storage or transmission.
Usage:
Create a compressed archive:
tar -cvzf archive.tar.gz /path/to/files
Extract a compressed archive:
tar -xvzf archive.tar.gz
Software Distribution
Purpose: Distribute software packages as a single archive.
Usage:
Source code or binaries are packaged into .tar.gz or .tar.bz2 files for download and installation.
Example:
tar -xvzf software.tar.gz
cd software
./configure
make
sudo make install
File Organization
Purpose: Group related files for organization and versioning.
Purpose: Extract only certain files from a large archive.
Usage:
tar -xvf archive.tar path/to/file
Inspecting Archive Contents
Purpose: View the contents of an archive without extracting.
Usage:
tar -tvf archive.tar
Incremental Backups
Purpose: Archive only files that have changed since the last backup.
Usage:
Using --newer or --listed-incremental options:
tar --newer='2024-12-01' -cvf incremental_backup.tar /path/to/directory
Tools Commonly Used with File Archives:
gzip, bzip2, xz: For compression (.gz, .bz2, .xz).
cpio: Another archiving tool, often used with pipelines.
zip/unzip: For creating .zip archives, common for cross-platform compatibility.
File archives streamline handling large numbers of files, ensure efficient storage, and facilitate easy recovery, making them a fundamental tool in Unix systems.
Filesystem Types
Before any disk partition can be used, a filesystem must be built on it.
When a filesystem is made, certain data structures are written to disk that will be used to access and organize the physical disk space into files.
Table 6-5 lists the most important filesystem types available on the various systems we are considering.
Use
AIX
FreeBSD
HP-UX
Linux
Solaris
Tru64
Default local
jfs or jfs2
ufs
vxfs
ext3, reiserfs
ufs
ufs or advfs
NFS
nfs
nfs
nfs
nfs
nfs
nfs
CD-ROM
cdrfs
cd9660
cdfs
iso9660
hsfs
cdfs
Swap
not needed
swap
swap, swapfs
swap
swap
not needed
DOS
not supported
msdos
not supported
msdos
pcfs
pcfs
/proc
procfs
procfs
not supported
procfs
procfs
procfs
RAM-based
not supported
mfs
not supported
ramfs, tmpfs
tmpfs
mfs
Other
union
union
hfs
ext2
cachefs
cachefs
Table 6-5. Important filesystem types
Unix Filesystems: Moments from History
In the beginning, there was the System V filesystem, that is where we will start. This filesystem type once dominated System
V–based operating systems. The superblock of standard System V filesystems contained information about currently
available free space in the filesystem in addition to information about how the space in the filesystem is allocated.
It held the number of
free inodes and data blocks,
the first 50 free inode numbers, and
the addresses of the first 100 free disk blocks.
After the superblock came the inodes, followed by the data blocks. The System V filesystem was designed for storage efficiency. It generally used a small filesystem block size: 2K bytes or less. Traditionally, a block is the basic unit of disk storage;† all files consume space in multiples of the block size, and any excess space in the last block cannot be used by other files and is therefore wasted. If a filesystem has a lot of small files, a small block size minimizes waste. However, small block sizes are much less efficient when transferring large files.
System V filesystem
The System V filesystem type is obsolete at this point. It is still supported on some systems for backward compatibility purposes only. The BSD Fast File System (FFS) was designed to remedy the performance limitations of the System V filesystem. It supports filesystem block sizes of up to 64 KB. Because merely increasing the block size to this level would have had a horrendous effect on the amount of wasted space, the designers introduced a subunit to the block known as the fragment. While the block remains the I/O transfer unit, the fragment becomes the disk storage unit (although only the final chunk of a file can be a fragment). Each block may be divided into one, two, four, or eight fragments. Whatever its absolute performance status, the BSD filesystem is an unequivocal improvement over System V. For this reason, it was included in the System V.4 standard as the UFS filesystem type. This is its name on Solaris and Tru64 systems (as well as under FreeBSD). For a while, this filesystem dominated in the Unix arena.
In the next lesson, you will learn to create an archive.
[1]archive: An archive is a set of files that are packaged as a single, large file.
[2]extract: To extract files from an archive means to copy them out of an archive and onto the filesystem.