Skip to content

Filesystems

WALDEMAR KOZACZUK edited this page Jul 7, 2025 · 11 revisions

OSv supports a variety of filesystems, which are described in the paragraphs below. Layers-wise, it comes with the VFS layer (see fs/vfs/*) and particular filesystem implementations found under fs/**/*, except for ZFS found under bsd/sys/cddl/compat/opensolaris and bsd/sys/cddl/contrib/opensolaris. The ext2/3/4 filesystem is implemented by the modules libext and lwext4.

Root Filesystem

During boot time, OSv initially mounts the BootFS filesystem (see vfs_init() and mount_rootfs()) and then proceeds to mount and pivot to a 'real' filesystem like RoFS, ZFS or VirtioFS (for details see this code in loader.cc) unless the --nomount kernel option was specified. The root filesystem can be explicitly selected using the --rootfs option; otherwise, the loader will try to discover it by trying RoFS, then Virtio-FS, Ext, and eventually ZFS.

Please note that OSv also supports the /etc/fstab file, where one could add an extra filesystem mount point. In addition, one can mount an extra filesystem by prepending appropriate options to the command line like so:

./scripts/run.py --execute='--rootfs=rofs --mount-fs=zfs,/dev/vblk0.2,/data /hello

To build an image with a specific type of filesystem, you need to specify the fs option (which defaults to zfs) like so:

./scripts/build image=tests fs=rofs

RamFS

RoFS

Ext

In March 2024, the ext2/3/4 filesystem support was added in the form of a shared pluggable module libext on top of the lwext4 library. Since this initial commit, it has been improved to add thread safety, and fix various bugs identified when running unit tests on an ext image.

The libext module acts as an adapter between the VFS layer and the lower-level implementation of the ext2/3/4 filesystem driver provided by the lwext4 module. The lwext4 is a fork of the original lwext4 library to fix a csum bug and customize building the shared library, among other things. The libext module provides a set of the ext_*() functions that fill the vfsops and vnops tables and delegate to the lwext4 layer.

Motivation

The intention is to provide a lightweight read-write filesystem alternative to ZFS. It comes with the following benefits:

  • familiar to Linux users with tools available on most distributions
  • small binary of ~100K compared to ~800K large libsolaris.so; the libext.so is 32K and liblwext4 is 68K as of this writing
  • faster mount and boot time, similar to RoFS
  • smaller memory footprint
  • no dedicated kernel threads overhead (see https://github.com/cloudius-systems/osv/issues/247)

The main drawback is the I/O handling speed - ZFS is more sophisticated and thus faster.

The ideal use cases for ext would involve almost-stateless (not completely ephemeral) applications or microservices that need to read AND write some data (logs, modifiable configuration, etc) to a disk. Now, any more serious data applications like databases would greatly benefit from ZFS.

Please note the ext support is also pretty minimal - it does not support xattr (Extended Attributes), journal recovery and transactions, and sparse files. As far as caching is concerned, the lwext4 implements a simple RB-tree-based write-back cache for metadata blocks to efficiently read from and write to the i-node and block group tables. The file data, on the other hand, is read from and written to a block device directly without any page cache.

Build examples

./scripts/build fs=ext image=native-example                      #Builds image with ext mounted at /

./scripts/build fs=rofs_with_ext image=native-example -j$(nproc) #Builds image with rofs at / and ext mounted at /data

One can also use the new ext-disk-utils.sh script to mount an OSv ext image to inspect and manipulate its contents:

./scripts/ext-disk-utils.sh mount build/last/usr.img

ll build/release/usr.img.image/ #The contents of usr.img are available to read and write on the host

./scripts/ext-disk-utils.sh unmount build/last/usr.img /dev/nbd0

For more details on how to mount a secondary disk with the ext filesystem, please read this readme.

Ext2/3/4 file system documentation

ZFS

The ZFS code has been based on the FreeBSD implementation as of circa 2014 and has since been adapted to work in OSv. The ZFS is a sophisticated filesystem that traces its roots in Solaris, and you can find some resources about it on this Wiki page. The majority of the ZFS code can be found under the subtree bsd/sys/cddl/. The ZFS filesystem driver has been fairly recently extracted from the kernel as a separate shared library libsolaris.so which is dynamically loaded during boot time from a different filesystem (most likely BootFS or RoFS) before the ZFS filesystem can be mounted.

There are three ways ZFS can be mounted on OSv:

  1. The first and the original one assumes mounting ZFS at the root (/) from the 1st partition of the 1st disk - /dev/vblk0.1.
  2. The second one involves mounting ZFS from the 2nd partition of the 1st disk - /dev/vblk0.2 at an arbitrary non-root mount point, for example /data.
  3. Similarly, the third way involves mounting ZFS from the 1st partition of the 2nd or higher disk - for example, /dev/vblk1.1 at an arbitrary non-root mount point as well. Please note that both the second and third options assume that the root filesystem is non-ZFS - most likely RoFS or Virtio-FS.

The disadvantage of the 1st option is that the code and data live in the same read-write filesystem, whereas the other two options allow one to isolate code from mutable data. Ideally, one would put all code and configuration on the RoFS partition, colocated on the same disk (2) or not (3), and mutable data on a separate partition on the same disk (2) or different (3). It has been shown that booting and mounting ZFS from a separate disk is also slightly faster (by 30-40ms) compared to the original option 1.

Below are the examples of building and running OSv with ZFS

ZFS mounted at /

This is the original and default method. Please note that the libsolaris.so is part of the loader.elf and loaded from BootFS, which makes the kernel larger by ~800K.

./scripts/build image=native-example fs=zfs #The fs defaults to zfs

./scripts/run.py

OSv v0.56.0-152-gfd716a77
...
devfs: created device vblk0.1 for a partition at offset:4194304 with size:532676608
virtio-blk: Add blk device instances 0 as vblk0, devsize=536870912
...
zfs: driver has been initialized!
VFS: mounting zfs at /zfs
zfs: mounting osv/zfs from device /dev/vblk0.1
...

ZFS mounted from 2nd partition

This is a fairly new method that allows mounting ZFS at a non-root mount point like /data, for example, and mixed with another filesystem on the same disk. Please note that libsolaris.so is placed on a root filesystem (typically RoFS) under /usr/lib/fs/ and loaded from it automatically. The build script will implicitly add the relevant mount point line to the /etc/fstab.

./scripts/build image=native-example,zfs fs=rofs_with_zfs #Has to add zfs module that adds /usr/lib/fs/libsolaris.so to RoFS 

./scripts/run.py

OSv v0.56.0-152-gfd716a77
...
devfs: created device vblk0.1 for a partition at offset:4194304 with size:191488
devfs: created device vblk0.2 for a partition at offset:4385792 with size:532676608
virtio-blk: Add blk device instances 0 as vblk0, devsize=537062400
...
VFS: mounting rofs at /rofs
zfs: driver has been initialized!
VFS: initialized filesystem library: /usr/lib/fs/libsolaris.so
VFS: mounting devfs at /dev
VFS: mounting procfs at /proc
VFS: mounting sysfs at /sys
VFS: mounting ramfs at /tmp
VFS: mounting zfs at /data
zfs: mounting osv/zfs from device /dev/vblk0.2
...

ZFS mounted from a different disk

This fairly new method is similar to the above, in that it also allows ZFS to be mounted at a non-root mount point like /data, but this time from a different disk. Please note that libsolaris.so is placed on a root filesystem (typically RoFS) under /usr/lib/fs/ and loaded from it automatically as well. Similar to the above, the build script will implicitly add the relevant mount point line to the /etc/fstab.

./scripts/build image=native-example,zfs fs=rofs --create-zfs-disk #Creates empty disk at build/last/zfs_disk.img with ZFS filesystem

./scripts/run.py --second-disk-image build/last/zfs_disk.img

OSv v0.56.0-152-gfd716a77
...
devfs: created device vblk0.1 for a partition at offset:4194304 with size:1010688
virtio-blk: Add blk device instances 0 as vblk0, devsize=5204992
devfs: created device vblk1.1 for a partition at offset:512 with size:536870400
virtio-blk: Add blk device instances 1 as vblk1, devsize=536870912
...
VFS: mounting rofs at /rofs
zfs: driver has been initialized!
VFS: initialized filesystem library: /usr/lib/fs/libsolaris.so
VFS: mounting devfs at /dev
VFS: mounting procfs at /proc
VFS: mounting sysfs at /sys
VFS: mounting ramfs at /tmp
VFS: mounting zfs at /data
zfs: mounting osv/zfs from device /dev/vblk1.1
...

However, with a different disk setup, you can manually make OSv mount a different disk and partition by explicitly using the --mountfs boot option like so:

#Build ZFS disk somehow differently and make sure the `build` does not append ZFS mount point (inspect build/last/fstab)
./scripts/run.py --execute='--rootfs=rofs --mount-fs=zfs,/dev/vblk1.1,/data /hello' --second-disk-image <disk_path>

Creating and manipulating ZFS disks on the host

Please note that in the examples above, the ZFS pool and filesystem are created using the zfs_loader.elf version of OSv that executes zpool.so, zfs.so, and cpiod.so among others. This is actually quite fast and efficient, but recently we have enhanced the build mechanism to create ZFS disks using the zpool and zfs on a Linux host, provided you have OpenZFS installed (see more info here). To that end, there is a fairly new script zfs-image-on-host.sh that can be used to either mount an existing OSv ZFS disk or create a new one. The latter actually can be orchestrated by the build script if one passes the option --use-openzfs like so:

./scripts/build image=native-example fs=zfs -j$(nproc) --use-openzfs

Some help from the zfs-image-on-host.sh:

Manipulate ZFS images on the host using OpenZFS - mount, unmount, and build.

Usage: zfs-image-on-host.sh mount <image_path> <partition> <pool_name> <filesystem> |
                            build <image_path> <partition> <pool_name> <filesystem> <populate_image> |
                            unmount <pool_name>

Where:
  image_path      path to a qcow2 or raw ZFS image; defaults to build/last/usr.img
  partition       partition of disk above; defaults to 1
  pool_name       name of ZFS pool; defaults to osv
  filesystem      name of ZFS filesystem; defaults to zfs
  populate_image  boolean value to indicate if the image should be populated with content
                  from build/last/usr.manifest; defaults to true, but only used with the 'build' command

Examples:
  zfs-image-on-host.sh mount                                     # Mount OSv image from build/last/usr.img under /zfs
  zfs-image-on-host.sh mount build/last/zfs_disk.img 1           # Mount OSv image from build/last/zfs_disk.img 2nd partition under /zfs
  zfs-image-on-host.sh unmount                                   # Unmount OSv image from /zfs

Using the same script, you can always mount any ZFS disk on a host, inspect any files and modify them if you want, and unmount it. OSv will now see all changes if run with the same disk:

./scripts/zfs-image-on-host.sh mount build/last/zfs_disk.img
Connected device /dev/nbd0 to the image build/last/zfs_disk.img
Imported pool osv
Mounted osv/zfs at /zfs

[wkozaczuk@fedora-mbpro osv]$ find /zfs/
/zfs/
/zfs/seaweedfs
/zfs/seaweedfs/logs
/zfs/seaweedfs/logs/weed.osv.osv.log.WARNING.20220726-181118.2
/zfs/seaweedfs/logs/weed.osv.osv.log.INFO.20220726-180155.2
/zfs/seaweedfs/logs/weed.WARNING
/zfs/seaweedfs/logs/weed.INFO
/zfs/seaweedfs/master
/zfs/seaweedfs/master/snapshot
find: ‘/zfs/seaweedfs/master/snapshot’: Permission denied
/zfs/seaweedfs/master/log
/zfs/seaweedfs/master/conf

./scripts/zfs-image-on-host.sh unmount

Resources

VirtioFS

NFS

Clone this wiki locally