In an operating system, the filesystems are responsible for organizing and storing files and directories. They provide a way to access and manage the data stored on the storage devices. Such storage devices can be hard drives, SSDs, USB drives, magnetic tapes, optical discs, RAM disks, etc. They can be shared over the network and they can be even virtual (like procfs or sysfs).

Etymology

Before the age of computers, filing systems were used to store and organize paper documents. The concept of filesystems in computing is similar to the concept of filing systems in the real world. The words folder and directory came from there, too.

Filename

A filename identifies a file. It is a string of characters for us, humans to refer the given file more easily. It might contain letters, numbers, and special characters, have a maximum length, is case-sensitive or not, etc.

File extension

In some filesystems, the filename can have an extension. This is a string of characters after the last dot in the filename. It can be used to identify the file type, but it is not mandatory, and it can also be misleading. Do not rely on the file extension to determine the file type!

In Unix-like systems, you can use the file command to determine the file type based on the file content.

Directory

File systems typically (but not necessarily) organize files into directories. A directory is a special type of file that contains a list of other files and directories. It is also called a folder.

File metadata

File systems store additional information about files, called metadata.

Some examples:

  • file size: the size of the file in bytes or in blocks
  • when the file was created, last accessed, and last modified
  • ownership information: the user and group
  • permissions: who can read, write, and execute the file
  • file attributes: special flags like executable, read-only, hidden, system, etc.
  • device type: in Unix-like systems, everything is a file, including devices. The file system stores information about the device type (block device, character device, etc.)

Metadata is stored separately from the file content, for example in directory entries or in inodes.

Storage space organization

A local filesystem is typically organized into blocks, and a block is the smallest unit of storage that can be allocated. The file system allocates blocks to store the file content, so even if you strore only 1 byte in a file, the file system will allocate at least one block (or more) to store it.

Block size can vary between filesystems, but it is usually a power of 2, like 512 bytes, 1 KB, 4 KB, 8 KB, … and some filesystems allow you to specify the block size when creating them. Choosing a large block size can result in less fragmentation and better performance, but it can also waste space for small files, while a small block size can result in more fragmentation and less performance, but it can be more efficient for small files.

Fragmentation

When a file is stored in non-contiguous blocks, it is fragmented. This can happen when the filesystem cannot find a contiguous block to store the file, so it splits the file into multiple blocks. Fragmentation can slow down the file access time - for example for classic HDDs, the disk head has to move to different locations to read the file. SSDs are less affected by fragmentation, but it can still have an impact on performance.

Access control

Many filesystems support access control to restrict access to files and directories. There are many ways to control access, from the classical read-only flag, through permission bits (like in Unix-like systems), to more advanced ACLs (Access Control Lists) and capabilities.

Quotas

Filesystems can also support quotas to limit the amount of space a user or a group can use. This can be useful in multi-user systems to prevent a single user from filling up the disk.

Data integrity

Most filesystems provide some kind of data integrity mechanism to ensure that the data stored on the disk is not corrupted. This can be done by using checksums, journaling, copy-on-write, etc.

Filesystem types

  • Disk-based filesystems: store data on hard drives, SSDs, USB drives, etc. (like ext4, NTFS, FAT32)
  • Network filesystems: allow accessing files over the network (like NFS, SMB)
  • Optical disk filesystems: used on CDs, DVDs, Blu-rays, etc. (like ISO 9660, UDF)
  • Flash filesystems: used on flash memory devices (like JFFS2, UBIFS)
  • Tape filesystems: used on magnetic tapes (like LTFS)
  • Database filesystems: store data in a database-like format (DB2)
  • Virtual filesystems: provide an interface to kernel data structures (like procfs, sysfs, devfs)

Example 1: ext2/ext3/ext4

The ext filesystem family is one of the most popular filesystems in the Linux world. The first ext filesystem was ext (short for extended filesystem, since it was an extension of the original minix filesystem), developed in 1992, but it was quickly replaced by ext2 in 1993. And ext2 is basically still useable since ext3 and ext4 are backward compatible with it.

The main addition of ext3 over ext2 was journaling, which helps to recover the filesystem after a crash. ext4 added many new features like extents, delayed allocation, and faster fsck, but those are basically transparent to the user, mostly performance and reliability improvements. The basics of the ext* filesystem family are the same for more than 30 years.

Ext2 structure

Data is stored in blocks (usually 4 KB) and the filesystem is divided into block groups.

┌──────┬─────────┬─────────┬───...─┬─────────┐
│Boot  │ Block   │ Block   │       │ Block   │
│block │ group 0 │ group 1 │       │ group n │
└──────┴─────────┴─────────┴──...──┴─────────┘

Each block group contains:

  • a superblock: metadata about the filesystem
  • a block group descriptor table: metadata about the block group
  • a block bitmap: to track free and used blocks
  • an inode bitmap: to track free and used inodes
  • an inode table: to store metadata about files and directories
  • data blocks: to store file content
┌───────┬─────────────┬────────┬────────┬───────┐
│Super- │ Group       │ Block  │ Inode  │ Inode │
│ block │ descriptors │ bitmap │ bitmap │ table │
└───────┴─────────────┴────────┴────────┴───────┘

Inodes store metadata about files and directories, like file size, timestamps, ownership, permissions, etc.

Inodes are numbered, and each inode has a unique number within the filesystem. The inode number is used to reference the inode from the directory entry. There are 15 direct block pointers in the inode, which can point to data blocks. If the file is larger than 12 blocks, the (13.) indirect block pointer is used to point to a block that contains pointers to data blocks. If the file is larger than 12 + 256 blocks, the (14.) double indirect block pointer is used, and if it’s even larger, the (15.) triple indirect block is used, too.

Directories are special files that contain a list of directory entries. Each directory entry contains the filename and the inode number of the file or directory.

Hard links are created by adding multiple directory entries pointing to the same inode. (While soft links are created by adding a special file that contains only the path to the target file.)

Removing (rm) a file only removes the directory entry, but the file content is not removed until the last directory entry is removed. When no directory entry points to an inode, the inode is marked as free and can be reused. The system call for removing a directory entry is unlink() - since it’s basically the opposite of creating a link.

Example 2: ZFS

(I could have BTRFS here, but ZFS is much older, and has more features.)

ZFS (short for Zettabyte File System) is a modern filesystem developed by Sun Microsystems in 2005. It is designed to be robust, scalable, and easy to manage. ZFS has many advanced features like copy-on-write, snapshots, checksums, compression, deduplication, RAID-Z, etc. Then Oracle bought Sun and not much happened with it in the last ~15 years. It’s still a great filesystem, but it’s not as popular as it could be - due to the licensing issues, the CDDL license is not compatible with GPL, so it cannot be (properly) included in a Linux distribution.

ZFS features

  • Copy-on-write: When a block is modified, it is not overwritten in place, but a new block is written with the modified data.
  • Checksums: ZFS uses checksums to detect data corruption.
  • Self-healing: ZFS can detect and repair data corruption using checksums and redundant data.
  • Snapshots: ZFS can create read-only snapshots of the filesystem at any point in time.
  • Compression: ZFS can compress data on the fly to save space.
  • Deduplication: ZFS can deduplicate identical blocks to save space.
  • ZPools: ZFS uses zpools to manage storage devices. A zpool can consist of one or more storage devices, like hard drives, SSDs, or even files.
  • RAID: ZFS has built-in support for software RAID with different levels of redundancy (mirroring, RAID-5 like RAID-Z).
  • Encryption: ZFS supports encryption of data at filesystem level.
  • SLOG and L2ARC: ZFS has special devices for ZIL (ZFS Intent Log) and L2ARC (Level 2 Adaptive Replacement Cache) to improve performance. (Hybrid storage pools)