CPUs have historically instantiated the architecture specific version
of the TLBs to avoid a virtual function call, making them a little bit
more dependent on what the current ISA is. Some simple performance
measurement, the x86 twolf regression on the atomic CPU, shows that
there isn't actually any performance benefit, and if anything the
simulator goes slightly faster (although still within margin of error)
when the TLB functions are virtual.
This change switches everything outside of the architectures themselves
to use the generic BaseTLB type, and then inside the ISA for them to
cast that to their architecture specific type to call into architecture
specific interfaces.
The ARM TLB needed the most adjustment since it was using non-standard
translation function signatures. Specifically, they all took an extra
"type" parameter which defaulted to normal, and translateTiming
returned a Fault. translateTiming actually doesn't need to return a
Fault because everywhere that consumed it just stored it into a
structure which it then deleted(?), and the fault is stored in the
Translation object when the translation is done.
A little more work is needed to fully obviate the arch/tlb.hh header,
so the TheISA::TLB type is still visible outside of the ISAs.
Specifically, the TlbEntry type is used in the generic PageTable which
lives in src/mem.
Change-Id: I51b68ee74411f9af778317eff222f9349d2ed575
Reviewed-on: https://gem5-review.googlesource.com/6921
Maintainer: Gabe Black <gabeblack@google.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
GCC 7.2 is much stricter than previous GCC versions. The following changes
are needed:
* There is now a warning if there is an implicit fallthrough between two
case statments. C++17 adds the [[fallthrough]]; declaration. However,
to support non C++17 standards (i.e., C++11), we use M5_FALLTHROUGH.
M5_FALLTHROUGH checks for [[fallthrough]] compliant C++17 compiler and
if that doesn't exist, it defaults to nothing (no older compilers
generate warnings).
* The above resulted in a couple of bugs that were found. This is noted
in the review request on gerrit.
* throw() for dynamic exception specification is deprecated
* There were a couple of new uninitialized variable warnings
* Can no longer perform bitwise operations on a bool.
* Must now include <functional> for std::function
* Compiler bug for void* lambda. Changed to auto as work around. See
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82878
Change-Id: I5d4c782a4e133fa4cdb119e35d9aff68c6e2958e
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-on: https://gem5-review.googlesource.com/5802
Reviewed-by: Gabe Black <gabeblack@google.com>
Explicitly separate the way the data is represented in the underlying
representation from how it's represented in the instruction.
In order to make the ISA parser happy, the Mem operand needs to have
a single, particular type. To handle that with scalar types, we just
used uint64_ts and then worked with values that were smaller than the
maximum we could hold. To work with these new array values, we also
use an underlying uint64_t for each element.
To make accessing the underlying memory system more natural, when we
go to actually read or write values, we translate the access into an
array of the actual, correct underlying type. That way we don't have
non-exact asserts which confuse gcc, or weird endianness conversion
which assumes that the data should be flipped 8 bytes at a time.
Because the functions involved are generally inline, the syntactic
niceness should all boil off, and the final implementation in the
binary should be simple and efficient for the given data types.
Change-Id: I14ce7a2fe0dc2cbaf6ad4a0d19f743c45ee78e26
Reviewed-on: https://gem5-review.googlesource.com/6582
Reviewed-by: Gabe Black <gabeblack@google.com>
Maintainer: Gabe Black <gabeblack@google.com>
Print faulting instruction for unmapped address panic in faults.cc
and print extra info about corresponding fetched PC in base.cc.
Change-Id: Id9e15d3e88df2ad6b809fb3cf9f6ae97e9e97e0f
Reviewed-on: https://gem5-review.googlesource.com/6461
Reviewed-by: Gabe Black <gabeblack@google.com>
Maintainer: Gabe Black <gabeblack@google.com>
The FSW and TOP values are technically part of the same register, but
they have very different behaviors. One of them can be renamed and
float along without affecting global state, while the other requires
serialization. They just need to *look* like the same register when
read by the user.
Also, there was a missing break in setMiscRegNoEffect.
Change-Id: If58de0f566f65068208240f4001209fb9e1826d6
Reviewed-on: https://gem5-review.googlesource.com/6441
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Gabe Black <gabeblack@google.com>
These files aren't a collection of miscellaneous stuff, they're the
definition of the Logger interface, and a few utility macros for
calling into that interface (panic, warn, etc.).
Change-Id: I84267ac3f45896a83c0ef027f8f19c5e9a5667d1
Reviewed-on: https://gem5-review.googlesource.com/6226
Reviewed-by: Brandon Potter <Brandon.Potter@amd.com>
Maintainer: Gabe Black <gabeblack@google.com>
In the ISA instruction definitions, some classes were declared with
execute, etc., functions outside of the main template because they
had CPU specific signatures and would need to be duplicated with
each CPU plugged into them. Now that the instructions always just
use an ExecContext, there's no reason for those templates to be
separate. This change folds those templates together.
Change-Id: I13bda247d3d1cc07c0ea06968e48aa5b4aace7fa
Reviewed-on: https://gem5-review.googlesource.com/5401
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Alec Roelke <ar4jc@virginia.edu>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
The ISA parser used to generate different copies of exec functions
for each exec context class a particular CPU wanted to use. That's
since been changed so that those functions take a pointer to the base
ExecContext, so the code which would generate those extra functions
can be removed, and some functions which used to be templated on an
ExecContext subclass can be untemplated, or minimally less templated.
Now that some functions aren't going to be instantiated multiple times
with different signatures, there are also opportunities to collapse
templates and make many instruction definitions simpler within the
parser. Since those changes will be less mechanical, they're left for
later changes and will probably be done in smaller increments.
Change-Id: I0015307bb02dfb9c60380b56d2a820f12169ebea
Reviewed-on: https://gem5-review.googlesource.com/5381
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Generating dependency/build product information in the isa parser breaks scons
idea of how a build is supposed to work. Arm twisting it into working forced
a lot of false dependencies which slowed down the build.
Change-Id: Iadee8c930fd7c80136d200d69870df7672a6b3ca
Reviewed-on: https://gem5-review.googlesource.com/5081
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Gabe Black <gabeblack@google.com>
ARM systems require the coordination of the global and local
monitors. When the system is run without caches the global monitor is
implemented in the abstract memory object. This change adds a callback
from the abstract memory that notifies the local monitor when the
global monitor is cleared.
Additionally, for ARM systems the local monitor signals the event
register and wakes the thread context up. Subsequent wait-for-event
(WFE) instructions will be immediately signaled.
Change-Id: If6c038f3a6bea7239ba4258f07f39c7f9a30500b
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/3760
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
MOV Rd,Cd is MR encoded but the control register is operand 2
not operand 1 hence this needs to be MODRM_REG not MODRM_RM.
While MOV Cd,Rd is RM encoded registers are also swapped, so
it also needs to be MODRM_REG as well (as it already correctly is).
This fixes incorrect UD2 reportings leading to invalid traps
reported in O3 on X86 FS introduced with 4e939a7 .
Change-Id: Ib33c8ba87b00e0264d33da44fff64ed9e4d2d9d8
Reviewed-on: https://gem5-review.googlesource.com/4861
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Gabe Black <gabeblack@google.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
A condition can be specified which will tell the decoder whether to return
the instruction being requested, or, if the condition fails, UD2.
Change-Id: I0f1c075deb10754ce1dd88be1726a196294e41fd
Reviewed-on: https://gem5-review.googlesource.com/4580
Reviewed-by: Michael LeBeane <Michael.Lebeane@amd.com>
Maintainer: Gabe Black <gabeblack@google.com>
This fixes the function call to clone in syscall_emul.hh where
the x86 version should be called before the base implementation
of clone.
Change-Id: Iccd2f680ff6e3a5536037d688a80ab3f236bbd98
Signed-off-by: Sean Wilson <spwilson2@wisc.edu>
Reviewed-on: https://gem5-review.googlesource.com/3902
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
This patch adds some more functionality to the cpu model and the arch to
interface with the vector register file.
This change consists mainly of augmenting ThreadContexts and ExecContexts
with calls to get/set full vectors, underlying microarchitectural elements
or lanes. Those are meant to interface with the vector register file. All
classes that implement this interface also get an appropriate implementation.
This requires implementing the vector register file for the different
models using the VecRegContainer class.
This change set also updates the Result abstraction to contemplate the
possibility of having a vector as result.
The changes also affect how the remote_gdb connection works.
There are some (nasty) side effects, such as the need to define dummy
numPhysVecRegs parameter values for architectures that do not implement
vector extensions.
Nathanael Premillieu's work with an increasing number of fixes and
improvements of mine.
Change-Id: Iee65f4e8b03abfe1e94e6940a51b68d0977fd5bb
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
[ Fix RISCV build issues and CC reg free list initialisation ]
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/2705
With the hierarchical RegId there are a lot of functions that are
redundant now.
The idea behind the simplification is that instead of having the regId,
telling which kind of register read/write/rename/lookup/etc. and then
the function panic_if'ing if the regId is not of the appropriate type,
we provide an interface that decides what kind of register to read
depending on the register type of the given regId.
Change-Id: I7d52e9e21fc01205ae365d86921a4ceb67a57178
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
[ Fix RISCV build issues ]
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/2702
Replace the unified register mapping with a structure associating
a class and an index. It is now much easier to know which class of
register the index is referring to. Also, when adding a new class
there is no need to modify existing ones.
Change-Id: I55b3ac80763702aa2cd3ed2cbff0a75ef7620373
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
[ Fix RISCV build issues ]
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/2700
GDB breaks if more bytes are sent than the transmitted registers
actually need. Therefore the GdbRegCache struct needs to be packed to
prevent padding at the end.
Change-Id: Ib2c14eb70becdac609eb4f475d5dddbd5bcc60da
Signed-off-by: Matthias Hille <matthiashille8@gmail.com>
Reviewed-on: https://gem5-review.googlesource.com/3020
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
Remove redundant information from the ExtMachInst, hash the vex
information to ensure the decode cache works properly, print the vex info
when printing an ExtMachInst, consider the vex info when comparing two
ExtMachInsts, fold the info from the vex prefixes into existing settings,
remove redundant decode code, handle vex prefixes one byte at a time and
don't bother building up the entire prefix, and let instructions that care
about vex use it in their implementation, instead of developing an entire
parallel decode tree.
This also eliminates the error prone vex immediate decode table which was
incomplete and would result in an out of bounds access for incorrectly
encoded instructions or when the CPU was mispeculating, as it was (as far
as I can tell) redundant with the tables that already existed for two and
three byte opcodes. There were differences, but I think those may have
been mistakes based on the documentation I found.
Also, in 32 bit mode, the VEX prefixes might actually be LDS or LES
instructions which are still legal in that mode. A valid VEX prefix would
look like an LDS/LES with an otherwise invalid modrm encoding, so use that
as a signal to abort processing the VEX and turn the instruction into an
LES/LDS as appropriate.
Change-Id: Icb367eaaa35590692df1c98862f315da4c139f5c
Reviewed-on: https://gem5-review.googlesource.com/3501
Reviewed-by: Joe Gross <joe.gross@amd.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
When the LiveProcess class was renamed to be just Process, the CL author
also changed the syscall function from a virtual function into a regular
one. Unfortunately, the I386Process class overrode the syscall function
to adjust the return address so that control would return to the right
place. Without that adjustment, 32 bit x86 process would segfault and die
immediately after their first system call.
This change reinstates the virtual specifier on the base syscall function,
and adds an override keyword on the I386Process's version so that it won't
be orphaned again in the future. It also fixes some small style issues the
style checker script complained about.
Change-Id: I0d1178ea0eda6676050c8fc043820a2bb4d99c0d
Reviewed-on: https://gem5-review.googlesource.com/3500
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Brandon Potter <Brandon.Potter@amd.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
The new version modularizes the implementation of the various commands,
gets rid of dynamic allocation of the register cache, fixes some small
style problems, and uses exceptions to simplify error handling internal to
the GDB stub.
Change-Id: Iff3548373ce4adfb99106a810f5713b769df89b2
Reviewed-on: https://gem5-review.googlesource.com/3280
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Boris Shingarov <shingarov@gmail.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
If the operands were 64 bit, an intermediate calculation could lose a
carry bit. This change rearranges that intermediate calculation if the
operand width is large, and reworks the microop implementation in general
in an attempt to make it easier to understand.
Change-Id: Ib36333f3f2695a33cd9623e43682de22ebd2e7ea
Reviewed-on: https://gem5-review.googlesource.com/3381
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
When a branch micro-op belongs to a flow and the micro-op does not change
the nPC and just updates the nuPC (like a 'rep movs' flow), branching()
function always returns not-taken no matter actual micro-branch outcome.
Provided fix adds to the equation nuPC attribute checking since these kind
of branch micro-op only updates that pointer.
This issue has been found while debugging the performance of a copy-loop
implemented with memcopy function. Without the fix, 'rep movss' internal
micro-branch was always predicted as not-taken causing an squash event
after every branch micro-branch execution.
Using the provided test, branch mispredition went from 1922 without the fix
to 7.
Change-Id: I1bcbefae26aef47e3135817ef99b53d0ea0a98fa
This changeset sets the implementation policy for a subset of
system calls to the ignoreFunc implementation (for x86 only).
The ignored system calls likely will never be implemented and
this allows a warning to be issued instead of the simulation
exiting with a fatal.
Change-Id: I8d9741ad683151e88cc71156d3602e2d0ccb0acf
Reviewed-on: https://gem5-review.googlesource.com/2270
Maintainer: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Tony Gutierrez <anthony.gutierrez@amd.com>
Reviewed-by: Michael LeBeane <Michael.Lebeane@amd.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
This changeset extends the pipe system call to work with
architectures other than Alpha (and enables the syscall for
x86). For the dup system call, it sets the clone-on-exec
flag by default. For the dup2 system call, the changeset
adds an implementation (and enables it for x86).
Change-Id: I00ddb416744ee7dd61a5cd02c4c3d97f30543878
Reviewed-on: https://gem5-review.googlesource.com/2266
Maintainer: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Tony Gutierrez <anthony.gutierrez@amd.com>
Reviewed-by: Michael LeBeane <Michael.Lebeane@amd.com>
This changeset adds refactors the existing open system call,
adds the openat variant (enabled for x86 builds), and adds
additional "special file" test cases for /proc/meminfo and
/etc/passwd.
Change-Id: I6f429db65bbf2a28ffa3fd12df518c2d0de49663
Reviewed-on: https://gem5-review.googlesource.com/2265
Maintainer: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Tony Gutierrez <anthony.gutierrez@amd.com>
Reviewed-by: Michael LeBeane <Michael.Lebeane@amd.com>
The Process class is full of implementation details and
structures related to SE Mode. This changeset factors out an
internal class from Process and moves it into a separate file.
The purpose behind doing this is to clean up the code and make
it a bit more modular.
Change-Id: Ic6941a1657751e8d51d5b6b1dcc04f1195884280
Reviewed-on: https://gem5-review.googlesource.com/2263
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Modifies the clone system call and adds execve system call. Requires allowing
processes to steal thread contexts from other processes in the same system
object and the ability to detach pieces of process state (such as MemState)
to allow dynamic sharing.
This changeset adds functionality that allows system calls to retry without
affecting thread context state such as the program counter or register values
for the associated thread context (when system calls return with a retry
fault).
This functionality is needed to solve problems with blocking system calls
in multi-process or multi-threaded simulations where information is passed
between processes/threads. Blocking system calls can cause deadlock because
the simulator itself is single threaded. There is only a single thread
servicing the event queue which can cause deadlock if the thread hits a
blocking system call instruction.
To illustrate the problem, consider two processes using the producer/consumer
sharing model. The processes can use file descriptors and the read and write
calls to pass information to one another. If the consumer calls the blocking
read system call before the producer has produced anything, the call will
block the event queue (while executing the system call instruction) and
deadlock the simulation.
The solution implemented in this changeset is to recognize that the system
calls will block and then generate a special retry fault. The fault will
be sent back up through the function call chain until it is exposed to the
cpu model's pipeline where the fault becomes visible. The fault will trigger
the cpu model to replay the instruction at a future tick where the call has
a chance to succeed without actually going into a blocking state.
In subsequent patches, we recognize that a syscall will block by calling a
non-blocking poll (from inside the system call implementation) and checking
for events. When events show up during the poll, it signifies that the call
would not have blocked and the syscall is allowed to proceed (calling an
underlying host system call if necessary). If no events are returned from the
poll, we generate the fault and try the instruction for the thread context
at a distant tick. Note that retrying every tick is not efficient.
As an aside, the simulator has some multi-threading support for the event
queue, but it is not used by default and needs work. Even if the event queue
was completely multi-threaded, meaning that there is a hardware thread on
the host servicing a single simulator thread contexts with a 1:1 mapping
between them, it's still possible to run into deadlock due to the event queue
barriers on quantum boundaries. The solution of replaying at a later tick
is the simplest solution and solves the problem generally.
This changeset adds the ability to set a close-on-exec flag for a given
file descriptor. It also reworks some of the logic surrounding setting and
retrieving flags from the file description.
Moves aux_vector into its own .hh and .cc files just to get it out of the
already crowded Process files. Arguably, it could stay there, but it's
probably better just to move it and give it files.
The changeset looks ugly around the Process header file, but the goal here is
to move methods and members around so that they're not defined randomly
throughout the entire header file. I expect this is likely one of the reasons
why I several unused variables related to this class. So, the methods are
declared first followed by members. I've tried to aggregate them together
so that similar entries reside near one another.
There are other changes coming to this code so this is by no means the
final product.
The EIOProcess class was removed recently and it was the only other class
which derived from Process. Since every Process invocation is also a
LiveProcess invocation, it makes sense to simplify the organization by
combining the fields from LiveProcess into Process.
When in 64-bit mode, if the stack is accessed implicitly by an instruction
the alternate address prefix should be ignored if present.
This patch adds an extra flag to the ldstop which signifies when the
address override should be ignored. Then, for all of the affected
instructions, this patch adds two options to the ld and st opcode to use
the current stack addressing mode for all addresses and to ignore the
AddressSizeFlagBit. Finally, this patch updates the x86 TLB to not
truncate the address if it is in 64-bit mode and the IgnoreAddrSizeFlagBit
is set.
This fixes a problem when calling __libc_start_main with a binary that is
linked with a recent version of ld. This version of ld uses the address
override prefix (0x67) on the call instruction instead of a nop.
Note: This has not been tested in compatibility mode and only the call
instruction with the address override prefix has been tested.
See [1] page 9 (pdf page 45)
For instructions that are affected see [1] page 519 (pdf page 555).
[1] http://support.amd.com/TechDocs/24594.pdf
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
Used cppclean to help identify useless includes and removed them. This
involved erroneously included headers, but also cases where forward
declarations could have been used rather than a full include.