LD_PRELOAD library to inject O_DIRECT into file I/O
This library intercepts open() calls of the C standard library via the LD_PRELOAD mechanism and injects the O_DIRECT flag for files that match the name pattern given in the config file. This mechanism has the advantage that the source code of the actual application does not need to be modified to take advantage of O_DIRECT and that O_DIRECT can also be used in cases where the application could otherwise not specify O_DIRECT, e.g. in case of MPI-IO.
(click to expand)
The preload library is activated on a per-application basis through the LD_PRELOAD environment variable and reads two environment variables for configuration:
LD_VAST_PATHFILE[mandatory]: The path to a text file containing a list of newline-separated paths for which O_DIRECT should be injected. Wildcards "*" and "?" can be used.- The asterisk also matches slashes and thus subdirectories, e.g.
/data/*.xyzalso matches/data/dir1/dir2/myfile.xyz
- The asterisk also matches slashes and thus subdirectories, e.g.
LD_VAST_LOG_TOPICS[optional]: Enables console logging based on the flags defined asENV_LOG_TOPIC_...insource/Logger.h. E.g. useLD_VAST_LOG_TOPICS=2to print paths for which O_DIRECT gets injected. Or useLD_VAST_LOG_TOPICS=-1to enable most verbose logging.
Let's assume we have an application myapp and we know that this application reads several large files in the directory /data/myapp/input, which we want to speed up through injection of O_DIRECT.
For that, we first create a config file containing the corresponding path:
echo "/data/myapp/input/*" > /data/myapp/vastpreload.pathsNow we're ready to run the application as usual, just with LD_PRELOAD and the LD_VAST_PATHFILE environment variables prepended. (The path here assumes that the library has been installed to /usr/lib, e.g. via an .rpm or .deb package.)
LD_PRELOAD=/usr/lib/libvastpreload.so LD_VAST_PATHFILE=/data/myapp/vastpreload.paths ./myapp`Note: If we wanted to see where O_DIRECT gets injected, we could also add LD_VAST_LOG_TOPICS=2.
In this example, we use the I/O benchmarking tool ior together with MPICH to show how the preload library can be used for applications based on MPI-IO.
Note: Typical MPI-IO implementations (including ROMIO, which is used by MPICH) detect when they access files on an NFS mountpoint and then add POSIX file lock calls around all read & write operations to achieve coherence between different nodes. This locking is often unnecessary, e.g. when the nodes are reading a large input dataset or when they are writing to different output files.
Especially when O_DIRECT is used, any locking for coherence is unnecessary, because O_DIRECT bypasses caching and thus makes accesses automatically coherent.
MPICH provides the ROMIO_FSTYPE_FORCE="ufs" option to disable the unnecessary locking, which we will use here. This selects the ROMIO ufs (Universal Filesystem) driver instead of the nfs driver.
Note: By default, ior generates highly compressible data. Recent versions of ior have a new -l random option to make data less compressible, which we will use here.
We assume that the preload library has been installed on the system through an rpm/deb package or through make install to /usr/lib by a sysadmin, but it can generally also be used from other paths to avoid the need for root privileges.
First, we need to prepare a path file to configure for which paths O_DIRECT should be injected. This file needs to be in a location where all participating hosts can access it, so we just put it into our shared mountpoint /mnt/vast.
mkdir /mnt/vast/ior
echo '/mnt/vast/ior/*' > /mnt/vast/ior/vastpreload.pathsNow we're ready to run the actual ior command via mpirun. The LD_PRELOAD variable and corresponding preload library config options get transferred to the other hosts via the -env argument of mpirun.
ROMIO_FSTYPE_FORCE="ufs:/mnt/vast/ior" mpirun -env LD_VAST_PATHFILE=/mnt/vast/ior/vastpreload.paths -env LD_PRELOAD=/usr/lib/libvastpreload.so -env LD_VAST_LOG_TOPICS=2 --hostfile myhosts -np 16 ./ior -w -r -i 1 -t 1m -b 2g -g -k -K -F -l random -o /mnt/vast/ior/myfile -a MPIIOIn this example, we enabled extra logging through the optional LD_VAST_LOG_TOPICS=2 setting, so we will now see corresponding messages appear on the console about O_DIRECT being injected by the different MPI processes:
[...]
vastpreload: Injecting O_DIRECT. fd: 5; path: /mnt/vast/ior/myfile.0001
vastpreload: Injecting O_DIRECT. fd: 5; path: /mnt/vast/ior/myfile.0002
[...]- For OpenMPI, the environment variables are set via
-x name=value. - To disable automatic MPI-IO locking for NFS access, OpenMPI provides the
fs_ufs_lock_algorithm 1option.
Our example above would look like this with OpenMPI:
mpirun --mca fs_ufs_lock_algorithm 1 -x LD_VAST_PATHFILE=/mnt/vast/ior/vastpreload.paths -x LD_PRELOAD=/usr/lib/libvastpreload.so -x LD_VAST_LOG_TOPICS=2 --hostfile myhosts -np 16 ./ior -w -r -i 1 -t 1m -b 2g -g -k -K -F -l random -o /mnt/vast/ior/myfile -a MPIIOBuilding the VAST LD_PRELOAD Library requires a C++14 compatible compiler, such as gcc version 5.x or higher.
sudo apt install build-essential debhelper devscripts fakeroot git lintiansudo yum install gcc-c++ git make rpm-buildStart by cloning the main repository:
git clone https://github.com/vast-data/vast-preload-lib.git
cd vast-preload-libmake help will show you all build and rpm/deb packaging options.
This is the standard build command:
make -j $(nproc)You can use the library directly from the bin subdir (bin/libvastpreload.so), but you probably want to run make rpm or make deb now to build a package and install it on several hosts via yum/apt. On Ubuntu, run this:
make deb
sudo apt install ./packaging/vastpreload*.debThere you go. Happy streaming!