sim-se: add a faux-filesystem
This change introduces the concept of a faux-filesystem. The faux-filesystem creates a directory structure in m5out (or whatever output dir the user specifies) where system calls may be redirected. This is useful to avoid non-determinism when reading files with varying path names (e.g., variations from run-to-run if the simulation is scheduled on a cluster where paths may change). Also, this changeset allows circumventing host pseudofiles which have information specific to the host processor (such as cache hierarchy or processor information). Bypassing host pseudofiles can be useful when executing runtimes in the absence of an operating system kernel since runtimes may try to query standard files (i.e. /proc or /sys) which are not relevant to an application executing in syscall emulation mode. Change-Id: I90821b3b403168b904a662fa98b85def1628621c Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/12119 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> Tested-by: kokoro <noreply+kokoro@google.com>
This commit is contained in:
committed by
Brandon Potter
parent
e8d0b755ea
commit
54c77aa055
153
src/doc/se-files.txt
Normal file
153
src/doc/se-files.txt
Normal file
@@ -0,0 +1,153 @@
|
||||
Copyright (c) 2015-Present Advanced Micro Devices, Inc.
|
||||
All rights reserved.
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are
|
||||
met: redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer;
|
||||
redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution;
|
||||
neither the name of the copyright holders nor the names of its
|
||||
contributors may be used to endorse or promote products derived from
|
||||
this software without specific prior written permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
|
||||
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
|
||||
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
|
||||
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
||||
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
||||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
Authors: Brandon Potter
|
||||
|
||||
===============================================================================
|
||||
|
||||
This file exists to educate users and notify them that some filesystem open
|
||||
system calls may have been redirected by system call emulation mode
|
||||
(henceforth se-mode).
|
||||
|
||||
To provide background, system calls to open files with SYS_OPEN (man 2 open)
|
||||
inside se-mode will resolve by pass-through to glibc calls (man 3 open) on the
|
||||
host machine. The host machine will open the file on behalf of the simulator.
|
||||
Subsequently, se-mode acts as a shim for file access to the opened file. By
|
||||
utilizing the host machine, se-mode gains quite a bit of utility without
|
||||
needing to implement an actual filesystem.
|
||||
|
||||
A scenario for using normal files might be `/bin/cat $HOME/my_data_file`
|
||||
as the simulated application (and option). The simulator leverages the host
|
||||
file system to provide access to my_data_file in this case. Several things
|
||||
happen inside the simulator:
|
||||
1) The cat command will open $HOME/my_data_file by invoking the open
|
||||
system call (SYS_OPEN). In se-mode, SYS_OPEN is trapped by the simulator and
|
||||
the syscall_emul.hh:openImpl implementation is provided as a drop-in
|
||||
replacement for what normally occurs inside a real operating system.
|
||||
2) The openImpl code will pass through several path checks and realize
|
||||
that the file needs to be handled in the 'normal' case where se-mode utilizes
|
||||
the host filesystem.
|
||||
3) The openImpl code will use the glibc open library call on
|
||||
$HOME/my_data_file after normalizing invocation options.
|
||||
4) If the file successfully opens, se-mode will record the file descriptor
|
||||
returned from the glibc open and provide a translated file descriptor to the
|
||||
application. (If the glibc's file descriptor was passed back to the
|
||||
application, it would be noticable that the application runtime environment
|
||||
was wonky. The gem5.{opt,debug,fast} process needs to open files for its own
|
||||
purposes and the file descriptors for the simulated application perspective
|
||||
would appear out-of-order and arbitrary. They should appear in-order with the
|
||||
lowest available file-desciptor assigned on calls to SYS_OPEN. So, se-mode
|
||||
adds a level of indirection to resolve this problem.)
|
||||
|
||||
However, there are files which users might not want to open on the host
|
||||
machine; providing file access and/or file visibility to the simulated
|
||||
application may not make sense in these cases. Historically, these files
|
||||
have been handled by os-specific code in se-mode. The os-specific
|
||||
implementation has been referred to as 'special files'. Examples of
|
||||
special file implementations include /proc/meminfo and /etc/passwd. (See
|
||||
src/kern/linux/linux.cc for more details.)
|
||||
|
||||
A scenario for using special files might be running `/bin/cat /proc/meminfo`
|
||||
as the simulated application (and option). Several things will happen inside
|
||||
the simulator:
|
||||
1) The cat command will open the /proc/meminfo file by invoking the open
|
||||
system call (SYS_OPEN). In se-mode, SYS_OPEN is trapped by the simulator and
|
||||
the syscall_emul.hh:openImpl implementation is provided as a drop-in
|
||||
replacement for what normally occurs inside a real operating system.
|
||||
2) The openImpl code checks to see if /proc/meminfo matches a special
|
||||
file. When it notices the match, it invokes code to generate a replacement
|
||||
file rather than open the file on the host machine. (As it turns out, opening
|
||||
the host's version of /proc/meminfo will resolve to the gem5 executable which
|
||||
is probably not what the application intended.)
|
||||
3) The generated file is provided a file descriptor (which itself has
|
||||
special handling to preserve the illusion that the application is not running
|
||||
inside a simulator under weird conditions). The file descriptor is passed
|
||||
back to the application and it can subsequently use the file descriptor to
|
||||
access the redirected /proc/meminfo file.
|
||||
|
||||
Regarding special files, a subtle but important point is that these files
|
||||
are generated dynamically during simulation (in C++ code). Certain files,
|
||||
such as /proc/meminfo depend on the application state inside the simulator to
|
||||
have valid contents. With some files, you generally cannot anticipate what
|
||||
file contents should be before the application actually tries to inspect the
|
||||
contents. These types of files should all be handled using the special files
|
||||
method.
|
||||
|
||||
As an aside, users might also want to restrict the contents of a file to
|
||||
prevent non-determinism in the simulation. (This is another case for special
|
||||
handling of files.) It can be annoying to try to generate statistics for your
|
||||
new hardware widget (which of course will improve performance by some
|
||||
non-trivial percentage) when variance in the statistics is caused by
|
||||
randomness of file contents. A specific example which comes to mind is
|
||||
reading the contents of /dev/random. Ideally, se-mode should introduce no
|
||||
non-determinism. However, that is difficult (if not impossible) to achieve in
|
||||
practice for every application thrown at the simulator.
|
||||
|
||||
In addition to special files, there is another method to handle filesystem
|
||||
redirection. Instead of dynamically generating a file and providing it to
|
||||
the application, it is possible to pregenerate files on the host filesystem
|
||||
and redirect open calls to the pregenerated files. This is achieved by
|
||||
capturing the paths provided by the application SYS_OPEN and modifying the
|
||||
path before issuing the pass-through call to the host filesystem glibc open.
|
||||
The name for this feature is 'faux filesystem' (henceforth faux-fs).
|
||||
|
||||
With faux-fs, users can add paths via command line (via --chroot) or by
|
||||
modifying their configuration file to use the RedirectPath class. These
|
||||
paths take the form of original_path-->set_of_modified_paths. For instance,
|
||||
/proc/cpuinfo might be redirected to /usr/local/gem5_fs/cpuinfo __OR__
|
||||
/home/me/gem5_folder/cpuinfo __OR__ /nonsensical_name/foo_bar, etc.. The
|
||||
matching pattern and directory/file-structure is controlled by the user. The
|
||||
pattern match hits on the first available file which actually exists on the
|
||||
host machine.
|
||||
|
||||
As another subtle point, the faux-fs handling is fixed at simulator
|
||||
configuration time. The path redirection becomes static after configuration
|
||||
and the Python generated files in simout/fs/.. also exist after configuration.
|
||||
The faux-fs mechanism is __NOT__ suitable for files such a /proc/meminfo
|
||||
since those types of files rely on runtime application characteristics.
|
||||
|
||||
Currently, faux-fs is setup to create a few files on behalf of the average
|
||||
user. These files are all stuffed into the simout directory under a 'fs'
|
||||
folder. By default, the path is $gem5_dir/m5out/fs. These files are all
|
||||
hardcoded in the configuration since it is unlikely that an application wants
|
||||
to see the host version of the files. At the time of writing, the list can be
|
||||
viewed in configs/example/se.py by searching for RedirectPath. Most of
|
||||
the faux-fs Python generated files depend on simulator configuration (i.e.
|
||||
number of cores, caches, nodes, etc..). Sophisiticated runtimes might query
|
||||
these files for hardware information in certain applications (i.e.
|
||||
applications using MPI or ROCm since these runtimes utilize libnuma.so).
|
||||
|
||||
Of note, dynamically executables will open shared object files in the same
|
||||
manner as normal files. It is possible and maybe enen preferential to utilize
|
||||
the faux-fs to create a platform independent way of running applications in
|
||||
se-mode. Users can stuff all the shared libraries into a folder and commit the
|
||||
folder as part of their repository state. The chroot option can be made to
|
||||
point to the shared library folder (for each library) and these libraries will
|
||||
be redirected away from host libraries. This can help to alleviate environment
|
||||
problems between machines.
|
||||
|
||||
If there is any confusion on path redirection, the system call debug traces
|
||||
can be used to emit information regarding path redirection.
|
||||
Reference in New Issue
Block a user